Sage Journals: Discover world-class research

Abstract

Introduction

Preoperative differentiation of adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), and invasive adenocarcinoma (IAC) using computed tomography (CT) is crucial for clinical management. However, accurately classifying pure ground-glass nodules (pGGNs) presents significant challenges. The quantitative integration of intratumor heterogeneity (ITH) scores may enhance the accuracy of this ternary classification. Therefore, this study aimed to develop ternary classification models to classify AIS, MIA, and IAC by leveraging insights from 15 machine-learning algorithms and integrating ITH scores with clinical data.

Methods

The ternary classification models were evaluated using an independent validation set to assess metrics, such as the macro-average area under the curve (AUC), accuracy, precision, recall, and F1 score. We subsequently applied binary classification models to various tasks derived from the optimal ternary classification model to sequentially address the discordant classifications.

Results

In this retrospective study, a total of 512 potential pGGNs were classified into training and validation sets at a ratio of 7:3. Among the 15 models, the light gradient boosting machine (LightGBM) exhibited the best predictive performance as a ternary classification model, achieving a macro-average AUC and an accuracy of 0.808 and 0.630, respectively. Upon binary classification, the model achieved a respective AUC and accuracy of 0.839 and 0.630 for classifying AIS, 0.677 and 0.620 for classifying MIA, and 0.908 and 0.780 for classifying IAC.

Conclusion

The LightGBM model, identified as the optimal algorithm for integrating ITH scores with clinical data, effectively serves as a ternary classification model for assessing adenocarcinoma invasiveness on chest CT.

Keywords

intratumor heterogeneity score explainable machine learning ternary classification model binary classification model pure ground-glass nodules invasiveness

Introduction

Lung cancer is the most commonly diagnosed malignancy worldwide, with adenocarcinoma representing its most frequent histological subtype, and early-stage lung adenocarcinoma (LUAD) typically manifesting as pure ground-glass nodules(pGGNs) on CT imaging.¹ Although pGGNs generally exhibit relatively indolent behavior, this characteristic may contribute to overdiagnosis and overtreatment, making it crucial to ensure appropriate management.^2,3 Management strategies for pGGNs are primarily dictated by the degree of invasiveness of the underlying pathology. The optimal timing for limited resection can be established through follow-up evaluations for adenocarcinoma in situ (AIS). However, minimally invasive adenocarcinoma (MIA) and invasive adenocarcinoma (IAC) require immediate surgical intervention. The treatment options for AIS or MIA may include sub-lobectomy techniques, such as wedge resection or segmentectomy, whereas IAC is typically managed with standard therapies involving anatomical lobectomy and systematic mediastinal lymph node dissection.^4–6 Preoperative assessment of LUAD invasiveness using CT scans significantly affects clinical decision-making; however, this assessment can be challenging because pGGNs share similar CT radiological findings.

To address this problem, numerous studies have developed models using methods such as radiomics and deep learning,^7–10 with the aim of achieving prediction performance levels comparable to those of seasoned radiologists. However, existing radiomics and deep learning models have largely focused on computer vision methods and exhibit certain limitations. First, radiomics identifies individual or combined computational features as biomarkers; however, the application of radiomic features assumes uniformly distributed heterogeneity, which does not effectively capture the local characteristics of tumors.^7,8 Second, deep-learning algorithms can be computationally expensive, require large volumes of labeled data, and may be prone to overfitting. They typically lack interpretability, which makes it difficult to understand their decision-making processes, and are vulnerable to adversarial attacks.^9,10 Third, the invasiveness of MIAs is intermediate between that of AIS and IAC, necessitating distinct clinical management strategies compared to other subtypes.^11,12 However, most studies have limited their outcomes to binary classifications (AIS/MIA vs IAC^13,14 or AIS vs MIA/IAC^15,16), and only a few studies have developed ternary classification models (AIS vs MIA vs IAC).^17,18

The quantitative evaluation of intratumoral heterogeneity (ITH) in CT images, known as the ITH score, is an innovative method for analyzing the biological behavior of lung nodules. Specifically, the ITH score is a multi-scale radiomics metric derived from clustering label maps, which helps reduce variability and the complexity of feature dimensionality across CT images from diverse samples while preserving intrinsic heterogeneity. Li et al¹⁹ demonstrated that a higher ITH score is associated with poorer prognosis in patients with non-small cell lung cancer. Building on this, Zheng et al²⁰ showed that combining the ITH score with clinical data using a machine learning approach can effectively predict the pathological invasiveness of LUADs presenting as pGGNs. Qi et al²¹ further found that the diagnostic efficacy of the ITH score is comparable to that of radiomics analysis for predicting invasiveness in these lesions. Moreover, Zhang et al²² indicated that the ITH score outperforms traditional radiomic approaches in distinguishing the histological subtypes of LUADs manifesting as pGGNs.

In this study, we used an ITH score approach to assess intratumoral heterogeneity in CT images, building upon the methodologies established in previous research. We also applied 15 machine-learning algorithms to integrate CT radiological findings with the ITH score, resulting in the development of ternary classification models that enhanced the predictive ability for the invasiveness of adenocarcinoma manifesting as pGGNs. Furthermore, three binary classification tasks were integrated using a rule-based approach to derive ternary labels for predicting AIS, MIA, and IAC. This is the first study to develop machine learning-based models for the ternary classification of LUAD invasiveness in pGGNs (AIS, MIA, and IAC), aiming to enhance clinical management and therapy selection. It introduces a novel approach for the grading of pGGN invasiveness, addressing a key gap in radiological assessment and providing a new tool for precise diagnosis.

Methods

Enrollment of Patients

The reporting of this study conforms to the TRIPOD guidelines for prediction model development and validation.²³

In this retrospective study, we analyzed the clinical, pathological, and CT images of patients who underwent surgical resection for LUAD at the Affiliated Hospital of Southwest Medical University from June 2018 to June 2023. The study protocol was approved by the Ethics Committee of the Affiliated Hospital of Southwest Medical University (Approval No. KY2023147), with a review date of August 12, 2020. Additionally, the requirement for written informed consent was waived due to the retrospective nature of the study.

The inclusion criteria were (1) the presence of pGGNs on chest CT and a maximum lesion diameter ≤ 30 mm, (2) the availability of thin-slice CT images (≤ 4 mm), and (3) postoperative pathological diagnosis of primary LUAD. In contrast, the exclusion criteria include (1) having received preoperative chemotherapy or radiotherapy; (2) the presence of pulmonary metastases or multiple primary lung cancers; and (3) CT images of inferior quality due to noise, respiratory artifacts, or other movement artifacts.

After a thorough screening, we recruited 512 patients with LUADs that manifested as pGGNs on chest CT images. These individuals were randomly allocated to the training (n = 358) and validation (n = 154) sets, adhering to a ratio of 7:3. Supplementary Figure S1 shows the flowchart of the enrolled patients’ screening process.

Analysis of the Radiological Findings

Supplementary Digital Content S1 presents the CT acquisition protocols. The DICOM-formatted scans were analyzed using RadiAnt DICOM Viewer (https://www.radiantviewer.com) to evaluate the radiological features. Two board-certified cardiothoracic radiologists, each with over five years of experience, independently evaluated the chest CT images. Discrepancies were resolved through joint discussion. The radiological findings evaluated included location, lesion size, shape, boundary, lobulation sign, spiculation sign, vascular convergence sign, vacuole sign, and pleural indentation sign.

Lesion Segmentation

All pGGNs were manually contoured using ITK-SNAP (v4.0.0). Lesion layers were combined to reconstruct each nodule. Images were magnified during segmentation to ensure precise tracing along tumor margins, carefully excluding nearby pulmonary structures such as vessels, airways, and pleural tissue. An experienced cardiothoracic radiologist with over 5 years in chest CT initially delineated the boundaries on axial CT slices, which were then independently reviewed and refined by a second senior specialist with over 10 years of experience to ensure accuracy.

ITH Score Calculation

The ITH score serves as a quantitative metric for assessing tumor heterogeneity, achieved by integrating local pixel characteristics with global pixel distribution patterns extracted from CT images, as demonstrated in prior studies.^19–22 Pixel properties are systematically grouped into clusters according to feature similarity, with the optimal number of clusters determined to be five through k-means clustering. Following this, pixel-level feature extraction is performed, whereby a total of 104 pixel-level features are computed for each pixel within the tumor using a 2 × 2 window to capture local pixel characteristics. These per-pixel properties are uniformly computed throughout the lesion to capture spatially specific patterns. Pixels assigned to the same clusters exhibit similar signal intensities and texture profiles. Consequently, the ITH score, formalized in Equation 1, provides a quantitative measure of this diversity across the label maps.

I T H s c o r e = 1 - \frac{1}{S_{t o t a l}} \sum_{i = 1}^{V} \frac{S_{i, m a x}}{n_{i}},

(1)

where V represents the number of clusters, and S_total denotes the total area of the lesion. The parameter n_i indicates the number of connected regions, and S_i,max refers to the largest area of a connected region within cluster i. Additionally, n_i and S_i,max are pivotal for assessing the diversity of patterns within each cluster. A greater number of connected regions and a smaller maximum area imply an increased variety of patterns. The ITH score, which ranges from 0 to 1, signifies greater diversity in the cell composition and spatial distribution at higher values.

Ternary Classification Models

We employed 15 supervised machine-learning algorithms for the ternary classification models to predict the invasiveness of LUAD manifesting as pGGNs. These algorithms, implemented in the Scikit-learn package (https://github.com/scikit-learn/scikit-learn), included logistic regression, decision tree, random forest, k-nearest neighbor, multinomial naïve Bayes, adaptive boosting, light gradient boosting machine (LightGBM), extreme gradient boosting (XGBoost), categorical boosting, gradient boosting decision tree, linear discriminant analysis (LDA), quadratic discriminant analysis, extremely randomized trees, support vector machine (SVM), and multilayer perceptron. The ITH score, clinical data (age and sex), and CT radiological findings were input into these classifiers.

Finally, we employed an automated GridSearch approach utilizing five-fold cross-validation to eliminate randomness in parameter selection. This method allows for the automatic identification of optimal parameters for all required model fittings, as detailed in Supplementary Digital Content S2.

Binary Classification Models

The diagnostic efficiencies of the 15 ternary classification models were evaluated using various metrics, including the F1 score, precision, recall, and accuracy. Specifically, the model that achieved the best overall performance was designated as the optimal classifier. We also employed three binary classification models to enhance the optimal classifier through a rule-based approach for assigning ternary labels to predict AIS, MIA, and IAC. These tasks comprised binary classification tasks 1 (AIS vs MIA and IAC), 2 (MIA vs AIS and IAC), and 3 (IAC vs AIS and MIA). Consequently, three binary classification outcomes were generated for each pGGN, which were logically fused based on the rules established in the truth table derived from the ternary labels. Figure 1 presents the conceptual framework of this study.

Figure 1.

Conceptual Framework of the Study.

Model Interpretation

SHapley Additive exPlanations (SHAP) was employed to interpret the machine-learning models, providing both global and local interpretations.²⁴ The global interpretation provides reliable and consistent attribution values for each feature within the model, reflecting the connections between the input features and their outcomes. These relationships were visualized using summary plots for the entire population and waterfall plots for specific individuals. The local interpretation illustrates the variation across the entire study population when a single factor is altered, with all other factors held constant. This was visualized using individual conditional expectation (ICE) plots and partial dependence survival profiles (PDPs).

Statistical Analysis

All statistical analyses were conducted using R version 4.4.1 (https://www.r-project.org). Comparisons between the training and validation sets were performed using the “CBCgrps” and “nortest” packages. Continuous variables were evaluated using an independent sample t-test or the Wilcoxon rank-sum test, while categorical variables were examined using the chi-square test. Statistical significance was set at P < 0.05.

Results

Patient Characteristics

We enrolled 512 patients with LUAD presenting as pGGNs. Among these, 155 (30.3%), 172 (33.6%), and 185 (36.1%) were pathologically diagnosed with AIS, MIA, and IAC, respectively. The patients were randomly assigned to the training and validation sets at a 7:3 ratio. In the training set, comprising 358 patients, 104 (29.1%), 124 (34.6%), and 130 (36.3%) were diagnosed with AIS, MIA, and IAC, respectively. Similarly, in the validation set, which included 154 patients, 51 (33.1%), 48 (31.2%), and 55 (35.7%) were diagnosed with AIS, MIA, and IAC, respectively. The detailed characteristics of the training and validation sets were recorded and compared (Table 1). No significant differences were observed between the two sets, with all P-values exceeding 0.05, indicating that the groups were comparable.

Table 1.

Baseline Characteristics of the Training, and Validation set.

Variables	Total (n = 512)	Training set (n = 358)	Validation set (n = 154)	P value
Age(y), Median (Q1, Q3)	55 (48, 65)	56 (48, 65)	54 (48, 63.8)	0.159
Sex, n (%)				0.234
Male	150 (29.3)	111 (31)	39 (25.3)
Female	362 (70.7)	247 (69)	115 (74.7)
Lesion size(mm), Median (Q1, Q3)	13.4 (9.8, 18.5)	13.2 (9.8, 18.4)	13.7 (10.1, 18.7)	0.623
CT value (Hu), Median (Q1, Q3)	−613.8 (−672.4, −539.6)	−614.3 (−674.9, −550)	−610 (−666.8, −529.9)	0.552
Location, n (%)				0.118
Left upper lobe	178 (34.8)	125 (34.9)	53 (34.4)
Left lower lobe	37 (7.2)	20 (5.6)	17 (11)
Right upper lobe	73 (14.3)	56 (15.6)	17 (11)
Right middle lobe	155 (30.3)	105 (29.3)	50 (32.5)
Right lower lobe	69 (13.5)	52 (14.5)	17 (11)
Margin, n (%)				0.538
Ill-Defined	422 (82.4)	298 (83.2)	124 (80.5)
Well-Defined	90 (17.6)	60 (16.8)	30 (19.5)
Shape, n (%)				0.537
Irregular	367 (71.7)	260 (72.6)	107 (69.5)
Round or oval	145 (28.3)	98 (27.4)	47 (30.5)
Lobulation sign, n (%)				1.000
Absence	331 (64.6)	231 (64.5)	100 (64.9)
Presence	181 (35.4)	127 (35.5)	54 (35.1)
Spiculation sign, n (%)				1.000
Absence	379 (74)	265 (74)	114 (74)
Presence	133 (26)	93 (26)	40 (26)
Vascular convergence sign, n (%)				0.103
Absence	127 (24.8)	81 (22.6)	46 (29.9)
Presence	385 (75.2)	277 (77.4)	108 (70.1)
Vacuole sign, n (%)				0.605
Absence	441 (86.1)	306 (85.5)	135 (87.7)
Presence	71 (13.9)	52 (14.5)	19 (12.3)
Pleural indentation sign, n (%)				0.222
Absence	318 (62.1)	229 (64)	89 (57.8)
Presence	194 (37.9)	129 (36)	65 (42.2)
ITH score, Median (Q1, Q3)	0.6 (0.4, 0.7)	0.6 (0.4, 0.7)	0.6 (0.4, 0.7)	0.762
Pathologic diagnosis, n (%)				0.611
AIS	155 (30.3)	104 (29.1)	51 (33.1)
MIA	172 (33.6)	124 (34.6)	48 (31.2)
IAC	185 (36.1)	130 (36.3)	55 (35.7)

Abbreviation: AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma. ITH, intratumoral heterogeneity.

Diagnostic Performance of the Ternary Classification Models

Among the 15 machine learning-based ternary classification models used to predict the invasiveness of LUAD manifesting as pGGNs, LightGBM achieved the highest accuracy and recall of 0.630 and 0.617, respectively. LDA had the highest precision of 0.618, while XGBoost had the highest F1 score of 0.608. LightGBM was determined to be the optimal ternary classifier, considering the overall accuracy of the models and following a thorough evaluation of the data presented in Table 2 and the visualizations in Figure 2.

Figure 2.

Line graphs depicting the accuracy, recall, precision, and F1 score across the 15 machine learning-based ternary classification models.

Table 2.

Diagnostic Performance of Different Machine Learning Models.

Model	Accuracy	Recall	Precision	F1 score
LR	0.539	0.531	0.545	0.532
DT	0.591	0.581	0.563	0.568
RF	0.571	0.558	0.550	0.546
KNN	0.461	0.454	0.454	0.452
MultinomialNB	0.506	0.503	0.540	0.496
AdaBoost	0.545	0.535	0.549	0.521
LightGBM	0.630	0.617	0.600	0.601
XGBoost	0.623	0.614	0.606	0.608
CatBoost	0.552	0.542	0.540	0.538
GradientBoost	0.610	0.600	0.587	0.589
LDA	0.610	0.602	0.618	0.602
QDA	0.591	0.583	0.583	0.582
ExtRa Trees	0.604	0.593	0.588	0.585
SVM	0.578	0.573	0.589	0.576
MLP	0.500	0.491	0.504	0.475

Abbreviation: LR, logistic regression; DT, decision tree; RF, random forest; KNN, k-nearest neighbor; MultinomialNB, multinomial naïve bayes; AdaBoost, adaptive boosting ; LightGBM, light gradient boosting machine; XGBoost, extreme gradient boosting; CatBoost, categorical boosting; GradientBoost, gradient boosting decision tree; LDA, linear discriminant analysis; QDA, quadratic discriminant analysis; ExtRa Trees, extremely randomized trees; SVM, support vector machine; MLP, multilayer perceptron.

Figure 3 shows the three-class receiver operating characteristic (ROC) curve for LightGBM's ternary classification. The top two lines represent the micro-average and macro-average ROC curves. Macro-averaging computes the metric of interest for each label independently and averages it across labels, whereas micro-averaging aggregates the contributions of each label to compute the average metric. The area under the curve (AUC) for the micro-average and macro-average ROC curves were 0.827 and 0.808, respectively. Figure 4 shows the heat map of the confusion matrix for LightGBM's ternary classification, displaying the results of the discriminant analysis between the predicted and true labels.

Figure 3.

Three-Class Receiver Operating Characteristic Curve for Ternary Classification Using the Light Gradient Boosting Machine.

Figure 4.

Heat map of the confusion matrix for ternary classification using the light gradient boosting machine.

Diagnostic Performance of the Binary Classification Models

The three binary classification tasks comprised tasks 1 (AIS vs MIA/IAC), 2 (MIA vs AIS/IAC), and 3 (IAC vs AIS/MIA). Figure 5 illustrates the ROC curves and confusion matrices. The binary classification models achieved better performance for tasks 1 and 3 than for task 2. Specifically, the binary classification model achieved an AUC and accuracy of 0.839 and 0.630, respectively, for classifying AIS (task 1), whereas the model achieved an AUC of 0.908 and an accuracy of 0.780 for classifying IAC (task 3). The model demonstrated a lower performance, yielding an AUC and accuracy of 0.677 and 0.620, respectively, for classifying MIA (task 2).

Figure 5.

Receiver Operating Characteristic Curves and Heat Maps of Confusion Matrices for the Three Binary Classification Tasks: Tasks 1 (AIS vs MIA/IAC), 2 (MIA vs AIS/IAC), and 3 (IAC vs AIS/MIA). AIS, Adenocarcinoma in Situ; MIA, Minimally Invasive Adenocarcinoma; IAC, Invasive Adenocarcinoma.

Model Interpretation

The SHAP method was used to obtain global and local interpretations of the XGBoost model. These global explanations are illustrated in SHAP summary plots (Figures 6a and 6b), where the average SHAP value was used to evaluate the contribution of each feature to the entire population. The ITH score and lesion size showed a contribution compared to the other variables. It exhibited the most significant contribution to the ternary classification, followed by lesion size. Furthermore, the ITH score and lesion size in the binary classification models demonstrated poor performance for classifying MIA, which has intermediate invasiveness, but strong performance for classifying AIS, with low invasiveness, and IAC, with high invasiveness. To illustrate the predictive outcomes for a specific sample, we referred to the waterfall plots, illustrated in Supplementary Figure S2. Using the prediction of AIS as an example, $E [f (x)]$ represents the model's baseline value, which is the expected output across all features (−1.417), where $f (x)$ represents the final predicted value for this specific sample (−2.16). In this context, the ITH score and lesion size contributed negatively, with values of −0.57 and −0.17, respectively.

Figure 6.

SHapley Additive exPlanations summary plots illustrating global explanations: (6a) two-dimensional and (6b) three-dimensional bar graphs.

Local explanations were visualized using ICE and PDP, taking the AIS prediction as an example. The ICE plot (Figure 7) illustrates the changing paths of each sample's AIS prediction across different value ranges of the ITH score and lesion size. As an overall trend, the likelihood of AIS increased with decreasing ITH score and lesion size. The PDP (Figure 8) simultaneously considered the value ranges of the ITH score and lesion size, displaying the impact of their interaction on predicting the dependent variable. Additionally, the axes represent the two features, and the contour lines show the predicted values of the model for different combinations of these features. Each contour line represents a specific predicted value and the model's predicted value remained constant along the contour line. The shapes and distributions of the contour lines revealed how the two features interacted and influenced the predictions of the model. Color mapping indicated the magnitude of the model's predicted values, with darker colors signifying an increased likelihood of AIS as the ITH score and lesion size decreased. Notably, the interaction between the ITH score and lesion size was most evident when the ITH score was approximately 0.5.

Figure 7.

Individual Conditional Expectation Plots Showing the Changing Prediction Paths of Each Sample Across the Varying Value Ranges of the ITH Score and Lesion Size.

Figure 8.

Partial dependence plots illustrate the interaction effects of the ITH score and lesion size on the prediction of the dependent variable, considering their value ranges simultaneously.

Discussion

We used the ITH score approach to assess tumor heterogeneity in CT images and serve as a predictor of invasiveness in LUAD presenting as pGGNs. Subsequently, we applied 15 machine-learning algorithms to integrate the CT radiological findings with the ITH score, resulting in the LightGBM model, which exhibits the best predictive ability for the ternary classification model. We developed three binary classification models, which achieved higher performances in classifying AIS, with low invasiveness, and IAC, with high invasiveness (tasks 1 and 3, respectively). In contrast, the model demonstrated lower performance in classifying MIA, with intermediate invasiveness (task 2).

Previous studies have shown that the invasiveness of pGGNs is primarily associated with lesion size, rather than with CT morphological features such as shape, margin, lobulation sign, or spiculation sign.^{14,20,21,25–27} Specifically, Qi et al²¹ reported that tumor size (OR 1.28; 95% CI: 1.20-1.36; P < 0.001) serves as an independent predictor capable of distinguishing IAC in LUADs presenting as pGGNs. Furthermore, it is recommended that the management of pGGNs be based on lesion size, in accordance with the guidelines established by the eighth edition of the International Association for the Study of Lung Cancer Lung Cancer Staging Project.²⁸

Similarly, in this study, the global explanations for the established LightGBM model using the SHAP method indicated that lesion size was the second most important factor after the ITH score, with both factors significantly outperforming the other variables. This suggests that lesion size has a greater influence on the invasiveness of pGGNs than other morphological features, which aligns with the findings of previous studies.^14,20,25 However, lesion size is limited in objectivity and cannot completely represent biological characteristics; therefore, it performs slightly worse compared with the ITH score, which was adopted in the present study.

The ITH score, derived from clustering label maps, aids in reducing variations among the images of different samples while preserving their inherent heterogeneity. It effectively addresses the biases associated with complex feature dimensionality reductions in radiomics analysis by generating quantitative data from intuitive cluster patterns. Furthermore, this score provides an objective evaluation of tumor heterogeneity by incorporating pixel characteristics and their spatial distributions. This represents a significant advancement in capturing multi-scale tumor heterogeneity through the integration of local and global pixel information. The ITH score illustrates substantial clinical utility in lung cancer management. Specifically, it offers prognostic value in non-small cell lung cancer¹⁹ and facilitates the assessment of pathological invasiveness and histological subtyping in LUADs presenting as pGGNs.^20–22 To the best of our knowledge, this is the first study to use the ITH score for a ternary classification model (AIS vs MIA vs IAC) as a predictor of invasiveness in LUAD presenting as pGGNs.

In evaluating 15 machine learning models for predicting invasiveness in LUAD presenting as pGGNs, LightGBM achieved the highest accuracy (0.630) and recall (0.617), outperforming competitors such as XGBoost and SVM. This superiority is attributed to its efficient gradient-based sampling and feature optimization, which facilitate robust performance with high-dimensional radiomic data. While XGBoost achieved a slightly better F1-score (0.608) and LDA exhibited precision leadership (0.618), the balance of accuracy and recall provided by LightGBM is clinically critical for minimizing the risk of underdiagnosing invasive tumors. In contrast to XGBoost and SVM, which struggle with nonlinear CT texture patterns, and LDA, which operates under the assumption of unrealistic data distributions, LightGBM's architecture and computational efficiency render it the most practical choice for clinical translation.²⁹

The SHAP summary plots provide global explanations for the established LightGBM ternary classification, demonstrating that the ITH score and lesion size exhibited significant differences compared to the other variables. Local explanations were visualized using ICE and PDP. The ICE plot indicated that, as an overall trend, the likelihood of AIS increased with decreasing ITH score and lesion size. The PDP simultaneously considered the value ranges of the ITH score and lesion size, displaying an increased likelihood of AIS as both variables decreased. Notably, the densest contour lines between the ITH score and lesion size were most evident when the ITH score was approximately 0.5, indicating that, within this interval, a significant interaction between the two variables was found in predicting the outcome.

The performance variation across classification tasks reflects inherent biological and diagnostic challenges. LightGBM-derived binary classifiers excelled in distinguishing the extremes of invasiveness, specifically between AIS and IAC, but struggled with the intermediate category of MIA. This difficulty likely arises from transitional state for MIA, characterized by minimal invasion (≤5 mm) and subtle CT texture information. Such biological ambiguity fosters imaging heterogeneity and overlaps in features with adjacent categories, complicating effective discrimination.³⁰ While current models may insufficiently capture the nuanced pathology from MIA, they demonstrate strength in clearly distinguishing non-invasive AIS from high-risk IAC. This distinction is critical for clinical decision-making, as it guides the choice between conservative monitoring and surgical intervention for pGGNs.

This study has several limitations that merit consideration. First, our models were developed using CT images of pGGNs, which may not be applicable to mixed ground-glass and solid nodules, thereby limiting the generalizability of our conclusions. To address this, future research should prospectively collect and analyze imaging data from patients with histologically confirmed mixed GGNs or solid nodules. Second, the lack of comprehensive smoking data for the study participants hindered our assessment of the influence of smoking on the model's efficacy. Future validation cohorts should implement standardized prospective collection of detailed smoking metrics through structured interviews and biochemical validation (eg, serum cotinine levels). Third, the retrospective selection of participants may have introduced bias. Incorporating follow-up data would enhance our ability to effectively differentiate the LUAD. Finally, the current model was specifically designed and optimized for internal clinical use within homogeneous data environments, where imaging protocols, patient populations, and annotation standards are controlled. However, its generalizability to external datasets may be limited. To address this limitation, further validation with external datasets, along with leveraging subsets of TCIA data that include multi-omics integration (eg, genomic or transcriptomic profiles), could enhance feature robustness by assessing model performance under controlled heterogeneity.

Conclusion

LightGBM emerged as the most effective algorithm for combining the ITH score and CT radiological findings to predict preoperative invasiveness in patients with LUAD manifesting as pGGNs. This ternary classification model shows promise in assisting with the preoperative care of patients with LUAD. Therefore, future studies should focus on validation through prospective research that includes follow-up data, comparisons with radiologists’ diagnostic judgments, exploration of this methodology for other artificial intelligence-supported diagnostic tasks, and improvements in algorithm accuracy.

Supplemental Material

sj-tif-1-tct-10.1177_15330338251365985 - Supplemental material for Intratumoral Heterogeneity Scores as Predictors of Invasiveness in Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules: Insights from Explainable Machine Learning-Based Ternary Classification Models

Supplemental material, sj-tif-1-tct-10.1177_15330338251365985 for Intratumoral Heterogeneity Scores as Predictors of Invasiveness in Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules: Insights from Explainable Machine Learning-Based Ternary Classification Models by Wang Peng, Wanyin Qi, Yunhua Li, Sanhong Zhang and Juan Long in Technology in Cancer Research & Treatment

Supplemental Material

sj-tif-2-tct-10.1177_15330338251365985 - Supplemental material for Intratumoral Heterogeneity Scores as Predictors of Invasiveness in Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules: Insights from Explainable Machine Learning-Based Ternary Classification Models

Supplemental material, sj-tif-2-tct-10.1177_15330338251365985 for Intratumoral Heterogeneity Scores as Predictors of Invasiveness in Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules: Insights from Explainable Machine Learning-Based Ternary Classification Models by Wang Peng, Wanyin Qi, Yunhua Li, Sanhong Zhang and Juan Long in Technology in Cancer Research & Treatment

Supplemental Material

sj-docx-3-tct-10.1177_15330338251365985 - Supplemental material for Intratumoral Heterogeneity Scores as Predictors of Invasiveness in Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules: Insights from Explainable Machine Learning-Based Ternary Classification Models

Supplemental material, sj-docx-3-tct-10.1177_15330338251365985 for Intratumoral Heterogeneity Scores as Predictors of Invasiveness in Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules: Insights from Explainable Machine Learning-Based Ternary Classification Models by Wang Peng, Wanyin Qi, Yunhua Li, Sanhong Zhang and Juan Long in Technology in Cancer Research & Treatment

Supplemental Material

sj-docx-4-tct-10.1177_15330338251365985 - Supplemental material for Intratumoral Heterogeneity Scores as Predictors of Invasiveness in Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules: Insights from Explainable Machine Learning-Based Ternary Classification Models

Supplemental material, sj-docx-4-tct-10.1177_15330338251365985 for Intratumoral Heterogeneity Scores as Predictors of Invasiveness in Lung Adenocarcinoma Presenting as Pure Ground-Glass Nodules: Insights from Explainable Machine Learning-Based Ternary Classification Models by Wang Peng, Wanyin Qi, Yunhua Li, Sanhong Zhang and Juan Long in Technology in Cancer Research & Treatment

Footnotes

Abbreviations:

Acknowledgments

The authors thank Na Ren for insightful discussions during the early conceptual development of this study.

ORCID iD

Juan Long

Ethical Considerations

The study was approved by the Ethics Committee of The Affiliated Hospital, Southwest Medical University (KY2020147). The requirement for written informed consent was waived due to the retrospective nature of the study.

Consent to Participate

The requirement for written informed consent was waived owing to the retrospective nature of the study.

Consent for Publication

Not applicable

Author Contributions

Wang Peng contributed to the writing and revision of the manuscript. Juan Long were responsible for conceptualizing and designing the study. Wanyin Qi collected the data. Yunhua Li and Sanhong Zhang provided supervision and contributed to the revision of the manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability

All data included in this study are available upon request by contact with the corresponding author.

Supplemental Material

Supplemental material for this article is available online.

References

Hong

Kang

Park

, et al. Effect of hybrid kernel and iterative reconstruction on objective and subjective analysis of lung nodule calcification in low-dose chest CT. Korean J Radiol. 2018;19(5):888–896.

Zhang

, et al. Genomic landscape and immune microenvironment features of preinvasive and early invasive lung adenocarcinoma. J Thorac Oncol. 2019;14(11):1912–1923.

Mazzone

Lam

. Evaluating the patient with a pulmonary nodule: A review. Jama. 2022;327(3):264–273.

Saji

Okada

Tsuboi

, et al. Segmentectomy versus lobectomy in small-sized peripheral non-small-cell lung cancer (JCOG0802/WJOG4607L): A multicentre, open-label, phase 3, randomised, controlled, non-inferiority trial. Lancet. 2022;399(10335):1607–1617.

Zuo

Zhang

Song

, et al. Survival nomogram for stage IB non-small-cell lung cancer patients, based on the SEER database and an external validation cohort. Ann Surg Oncol. 2021;28(7):3941–3950.

Altorki

Wang

Kozono

, et al. Lobar or sublobar resection for peripheral stage IA non-small-cell lung cancer. N Engl J Med. 2023;388(6):489–498.

Zuo

Zhang

Lin

, et al. Radiomics nomogram based on optimal volume of interest derived from high-resolution CT for preoperative prediction of IASLC grading in clinical IA lung adenocarcinomas: A multi-center, large-population study. Technol Cancer Res Treat. 2024;23:1–16. doi:https://doi.org/10.1177/15330338241300734

Song

Xing

Zhu

, et al. Hybrid clinical-radiomics model for precisely predicting the invasiveness of lung adenocarcinoma manifesting as pure ground-glass nodule. Acad Radiol. 2021;28(9):e267–e277.

Gong

Liu

Hao

Nie

Zheng

Wang

. Peng W: A deep residual learning network for predicting lung adenocarcinoma manifesting as ground-glass nodule on CT images. Eur Radiol. 2020;30(4):1847–1855.

10.

Yang

Zheng

Xie

Huang

Wang

. The invasiveness classification of ground-glass nodules using 3D attention network and HRCT. J Digit Imaging. 2020;33(5):1144–1154.

11.

Landreneau

Normolle

Christie

, et al. Recurrence and survival outcomes after anatomic segmentectomy versus lobectomy for clinical stage I non-small-cell lung cancer: A propensity-matched analysis. J Clin Oncol. 2014;32(23):2449–2455.

12.

Yotsukura

Asamura

Motoi

, et al. Long-Term prognosis of patients with resected adenocarcinoma in situ and minimally invasive adenocarcinoma of the lung. J Thorac Oncol. 2021;16(8):1312–1320.

13.

Ding

Xia

Zhang

, et al. CT-Based Deep learning model for invasiveness classification and micropapillary pattern prediction within lung adenocarcinoma. Front Oncol. 2020;10:1186. doi:https://doi.org/10.3389/fonc.2020.01186

14.

Zuo

Wang

Zeng

Zhang

. Measuring pure ground-glass nodules on computed tomography: Assessing agreement between a commercially available deep learning algorithm and radiologists’ readings. Acta Radiol. 2023;64(4):1422–1430.

15.

Yin

Tang

Fan

. Consecutive serial non-contrast CT scan-based deep learning model facilitates the prediction of tumor invasiveness of ground-glass nodules. Front Oncol. 2021;11:725599. doi:https://doi.org/10.3389/fonc.2021.725599

16.

Wang

Chen

, et al. Feature-shared adaptive-boost deep learning for invasiveness classification of pulmonary subsolid nodules in CT images. Med Phys. 2020;47(4):1738–1749.

17.

Zhao

Yang

Sun

, et al. 3D Deep learning from CT scans predicts tumor invasiveness of subcentimeter pulmonary adenocarcinomas. Cancer Res. 2018;78(24):6881–6889.

18.

Yanagawa

Niioka

Hata

, et al. Application of deep learning (3-dimensional convolutional neural network) for the prediction of pathological invasiveness in lung adenocarcinoma: A preliminary study. Medicine (Baltimore). 2019;98(25):e16119.

19.

Qiu

Zhang

, et al. ITHscore: Comprehensive quantification of intra-tumor heterogeneity in NSCLC by multi-scale radiomic features. Eur Radiol. 2023;33(2):893–903.

20.

Zheng

Chen

Liu

Zuo

. Enhancing the prediction of the invasiveness of pulmonary adenocarcinomas presenting as pure ground-glass nodules: Integrating intratumor heterogeneity score with clinical-radiological features via machine learning in a multicenter study. Digit Health. 2024;10:1–12. doi:https://doi.org/10.1177/20552076241289181

21.

Zuo

Lin

, et al. Assessment of intratumor heterogeneity for preoperatively predicting the invasiveness of pulmonary adenocarcinomas manifesting as pure ground-glass nodules. Quant Imaging Med Surg. 2024;15(1):272–286.

22.

Zhang

Sha

Liu

Zhou

Liu

Zuo

. Quantification of intratumoral heterogeneity: Distinguishing histological subtypes in clinical T1 stage lung adenocarcinoma presenting as pure ground-glass nodules on computed tomography. Acad Radiol. 2024;31(10):4244–4255.

23.

Collins

Reitsma

Altman

Moons

. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Br Med J. 2015;102(3): 148–158.

24.

Azodi

Tang

Shiu

. Opening the black box: Interpretable machine learning for geneticists. Trends Genet. 2020;36(6):442–455.

25.

Zhang

Wang

, et al. Computed tomography density is not associated with pathological tumor invasion for pure ground-glass nodules. J Thorac Cardiovasc Surg. 2021;162(2):451–459.e453.

26.

Hoye

Solomon

Sauer

Robins

Samei

. Systematic analysis of bias and variability of morphologic features for lung lesions in computed tomography. J Med Imaging (Bellingham). 2019;6(1):013504.

27.

Antonoff

. The search for reliable markers of ground glass opacity prognosis: The truth remains largely unknown. J Thorac Cardiovasc Surg. 2018;156(2):814–815.

28.

Rami-Porta

Nishimura

Giroux

, et al.; Members of the IASLC Staging and Prognostic Factors Committee and of the Advisory Boards, and Participating Institutions. The international association for the study of lung cancer lung cancer staging project: Proposals for revision of the TNM stage groups in the forthcoming (ninth) edition of the TNM classification for lung cancer. J Thorac Oncol. 2024;19(7):1007–1027.

29.

Bühlmann

Hothorn

. Boosting algorithms: Regularization, prediction and model fitting. Stat Sci. 2007;22:477–505.

30.

Zhang

Shen

, et al. Surgery for pre- and minimally invasive lung adenocarcinoma. J Thorac Cardiovasc Surg. 2022;163(2):456–464.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

2.00 MB

0.48 MB