Sage Journals: Discover world-class research

Abstract

Objective

Lung cancer is primarily categorized into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC), each characterized by distinct therapeutic approaches and prognostic outcomes, particularly in stage III peripheral cases. This study aimed to develop predictive models utilizing clinical and radiomic data to preoperatively differentiate stage III peripheral SCLC from NSCLC.

Method

We conducted a retrospective analysis of 33 stage III peripheral SCLC cases and 99 stage III peripheral NSCLC cases treated at our hospital between January 2016 and July 2024. A total of 1037 radiomic features were extracted from contrast-enhanced CT scans. The cohort was divided into a training set (n = 92) and a test set (n = 40). Radiomic feature selection was performed using the LASSO algorithm, and nine machine learning models were evaluated. The optimal model was employed to compute the radiomics score (Rad-score) and construct a clinical model. A combined model, integrating clinical factors and radiomic features, was assessed for clinical utility through receiver operating characteristic (ROC) curve analysis (area under the curve, AUC), KS statistics and decision curve analysis (DCA). We externally validated the combined model in a group of 84 patients from another hospital.

Results

The logistic regression-based combined model exhibited superior performance, achieving AUC values of 0.956, 0.775, and 0.841 for the combined, clinical, and radiomics models, respectively, within the training cohort, and 0.905, 0.864, and 0.732 in the test cohort. AUC for the combined model was 0.843 in the external validation cohort. The KS statistics and DCA indicated the clinical utility of the combined model, as evidenced by a Brier score of 0.115.

Conclusion

The integration of clinical parameters and radiomics features within the combined model may hold significant potential for the preoperative differentiation of stage III peripheral SCLC from NSCLC.

Keywords

radiomics non-small cell lung cancer small cell lung cancer peripheral lung cancer predictive model

Critical Relevance Statement

This research effectively established a model capable of differentiating between stage III peripheral SCLC and NSCLC by integrating preoperative clinicopathological, radiomic, and combined clinical-radiomic features. The model demonstrates significant potential to improve the accuracy of treatment planning for patients with advanced-stage cancers or those who may require neoadjuvant therapy in the future.

Key Points

The study aimed to develop predictive models to distinguish stage III peripheral SCLC from NSCLC.

The model was successfully constructed by incorporating preoperative clinical and radiomic features, thereby holding promise for enhancing the precision of lung cancer treatment.

Introduction

According to the data presented in the Cancer Report by the International Agency for Research on Cancer (IARC), lung cancer remained the leading cause of cancer-related mortality worldwide in 2022.¹ Lung cancer is primarily classified into two major subtypes: small cell lung cancer (SCLC), accounting for 10%–15% of cases, and non-small cell lung cancer (NSCLC), comprising 85% of cases.^2,3 SCLC is characterized by its aggressive nature and rapid proliferation, typically requiring a treatment regimen that combines chemotherapy and radiation therapy.^4,5 The role of surgical intervention in the management of early-stage SCLC continues to be a topic of ongoing debate.⁶ Current guidelines do not universally recommend surgical resection, largely due to the lack of robust, high-level evidence supporting its efficacy. Nevertheless, several retrospective analyses and smaller single-arm trials suggest that surgical treatment may offer benefits for certain patients with early-stage disease.⁷ A population-based analysis revealed that patients diagnosed with stage I or II SCLC who had T1-2N0 disease and underwent surgical treatment demonstrated significantly enhanced overall survival (OS) and lung cancer-specific survival (LCSS) rates compared to those who received radiotherapy as their sole treatment modality.⁸ This finding underscores the potential advantages of surgical intervention in a carefully selected cohort of patients. For individuals with resectable stage IIIA-N2 SCLC, concurrent chemoradiotherapy remains the standard treatment approach.⁹ This contrasts with the management of stage IIIA-N2 NSCLC, where surgical interventions, such as lobectomy or pneumonectomy, may be advantageous. The choice of treatment is influenced by the specific stage and histological type of the tumor and may include adjuvant therapies. Although SCLC is not always characterized by central lesions with mediastinal lymph node metastasis, some cases present as peripheral types, which are frequently misdiagnosed clinically as NSCLC. In cases where a patient is diagnosed with resectable stage IIIA-N2 lung cancer, surgical intervention may be considered the primary treatment option.

Within the context of lung cancer diagnosis, computed tomography (CT) scans are the most commonly utilized imaging modality. Numerous CT imaging characteristics have been identified in lung nodules that may assist in predicting malignancy. However, the imaging features specific to lung cancer are limited, and conventional CT imaging analysis predominantly relies on the visual assessment by radiologists, which may introduce interobserver variability.¹⁰ Furthermore, traditional CT analysis faces challenges in distinguishing between SCLC and NSCLC due to overlapping features, complicating visual differentiation in clinical practice. When lung cancer is suspected, a biopsy is performed to confirm the diagnosis and supplement CT imaging. Techniques such as bronchial brushing and CT-guided biopsy carry risks, including post-procedural infection, bleeding, and pneumothorax. Furthermore, pathological diagnosis through invasive biopsy generally assesses a localized area of the tumor rather than the entire neoplasm, thereby complicating comprehensive characterization. Additionally, the results of biopsies are not always promptly accessible, highlighting the imperative need to develop complementary non-invasive techniques for differentiating subtypes of primary lung cancer via radiomic analysis.

Radiomics is an emerging clinical technology designed to extract high-throughput features from medical images of lesions, thereby enhancing clinical decision-making processes.^11,12 In clinical practice, radiomics has been utilized in the diagnosis of lung nodules, facilitating the differentiation between benign and malignant forms, preoperative prediction of nodule types, classification of various NSCLC subtypes and SCLC, prognostic evaluations, prediction of surgical outcomes, and assessment of tumor gene expression patterns and microenvironmental characteristics.^13-15 The widespread implementation of imaging examinations in clinical diagnostics has increased the accessibility of radiomics research. Building on previous studies, this research aims to conduct a preliminary evaluation of the complex clinical parameters, radiomic features, and their integration for the preoperative differentiation of stage III peripheral SCLC from NSCLC.

Materials and Methods

This study was conducted in accordance with the Helsinki Declaration and received approval from the Ethics Committee of our two hospitals. For this retrospective study, informed consent was not required. And ethical review boards approved this retrospective analysis and waived informed consent requirements.

Patient Selection

The reporting of this study adheres to the TRIPOD guidelines.¹⁶ From January 2016 to July 2024, we conducted a retrospective review and analysis of all patients diagnosed with stage III peripheral SCLC and NSCLC at our hospital. The inclusion criteria for this study were as follows: (1) a confirmed diagnosis of SCLC or NSCLC through surgical pathology; (2) the presence of solitary and solid nodules on imaging, indicative of peripheral lung cancer; (3) completion of adequate staging procedures, including whole-body PET-CT or brain MRI, and contrast-enhanced CT of the chest and upper abdomen, including the adrenals, with a pathological tumor stage of III; and (4) the availability of comprehensive clinical and pathological data, including analyzable plain and enhanced thin-slice CT images at a thickness of 1 mm per slice. Patients were excluded from the study based on specific criteria: (1) undergoing anti-tumor therapies, such as radiotherapy, chemotherapy, chemoradiotherapy, or molecular targeted therapy, prior to CT examination and pathological diagnosis; (2) diagnosis of other cancer types; and (3) CT images were available that had been obtained more than two weeks before the pathological diagnosis. The Tumor, Node, Metastasis (TNM) stage was determined in accordance with the ninth edition of the TNM staging system established by the American Joint Committee on Cancer. Based on our selection criteria, this study included 33 patients diagnosed with stage III peripheral SCLC. A control group comprising 99 patients with stage III peripheral NSCLC was randomly selected. Additionally, we collected a dataset (n = 84) from January 2022 to November 2024 in another hospital (21 and 63 patients with SCLC and NSCLC, respectively) to validate the combined model externally. The inclusion and exclusion criteria were the same as for the development cohort. The retrospective nature of this analysis received approval from the hospital's ethical review board, with a waiver for informed consent. All patient information was anonymized to ensure confidentiality.

CT Image Acquisition

Contrast-enhanced chest CT scans were conducted on all patients using dual-source CT technology (SOMATOM Definition Flash or SOMATOM Force, Siemens Healthineers, Germany). Prior to imaging, patients participated in breathing exercises and maintained breath-hold during inspiration to enhance image quality. Scans were performed with patients in the supine position to minimize artifacts. The imaging range, defined using the sternoclavicular joint and thoracic inlet as reference points, was acquired at a tube voltage of 120 kV and a tube current of 100 mAs. Reconstruction parameters included a slice thickness of 1 mm, a matrix size of 512 × 512, and a pitch of 1.2. Conventional algorithms were employed for the reconstruction process to optimize soft tissue and pulmonary imaging. Following the initial scan, 70-90 mL of the nonionic contrast agent iohexol (300 mg/mL) was administered via the ulnar vein using a high-pressure syringe at a flow rate of 3 mL/s. Dual-phase enhanced imaging was subsequently performed, capturing the arterial phase at 30 s and the venous phase at 90 s after contrast administration. In addition to the standard imaging parameters, the raw data were transmitted to the post-processing terminal for multiplanar reconstruction (MPR). Image feature analysis was conducted independently by two thoracic radiologists, each possessing certification from their respective professional organizations and having 7 and 13 years of experience in chest CT imaging, respectively. The window settings were calibrated to a mediastinal window with a width of 400 HU and a level of 40 HU, as well as a lung window with a width of 1500 HU and a level of −600 HU. The study undertook a comprehensive examination of various characteristics of primary lung tumors, including their anatomical location (distinguishing between the left and right lungs, as well as the upper, middle, and lower lobes), size (determined by maximum diameter), and morphological features, such as shape and margin, which were categorized as lobular, spiculated, or vermiform/branching. The study further evaluated internal features, including the presence of cavities or vacuoles and swamp-like reinforcement, as well as external features such as the pleural indentation sign, in addition to associated signs like emphysema. The assessment of CT image characteristics was independently performed by two radiologists, with any discrepancies resolved through a standardized protocol. Clinical data for each patient were systematically recorded using the Electronic Medical Record System, encompassing patient demographics such as gender and mean age, as well as clinical histories, including smoking status. Tumor biomarkers were also documented, specifically progastrin-releasing peptide (proGRP), squamous cell carcinoma antigen (SCC-Ag), carcinoembryonic antigen (CEA), and neuron-specific enolase (NSE). The radiologists were tasked with collecting all clinical data via the Electronic Medical Record System. Further details on segmentation, feature extraction, selection, and model development are provided as following.

Segmentation, Feature Extraction, and Selection

CT images were imported into 3D-Slicer (version 5.0.2, accessible at http://www.slicer.org) and analyzed using lung (1500/-600 Hounsfield Units) and mediastinal (400/40 Hounsfield Units) window settings. A radiologist with 13 years of professional experience, blinded to clinical data, meticulously delineated regions of interest (ROIs) on a slice-by-slice basis. Tumor ROIs were defined to encompass all areas within a nodule, including any cavities or vacuoles, while explicitly excluding bronchi, blood vessels, and normal lung tissue. To evaluate intraclass agreement, all patients underwent independent tumor segmentation one month later. A second radiologist, with 7 years of experience, independently repeated the segmentation for these patients to assess interclass agreement. Intraclass correlation coefficients (ICCs) were employed to determine the reproducibility of feature extraction with respect to intra-observer and inter-observer variability.

Radiomic features were extracted using Pyradiomics within the 3D-Slicer software. A total of 1037 radiomic features were derived from the images, including Original, log-sigma-4-0-mm-3D, and Wavelet transformations. The features extracted from the original image included 14 shape factor categories, 18 first-order histogram categories, 24 categories from the gray level co-occurrence matrix (GLCM), 16 categories from the gray level run length matrix (GLRLM), 16 categories from the gray level size zone matrix (GLSZM), 5 categories from the neighboring gray tone difference matrix (NGTDM), and 14 categories from the gray level dependence matrix (GLDM). Except for the shape factor categories, the type and quantity of features extracted from other image types were consistent with those extracted from the original image.

To address the challenge of dimensionality reduction of radiomic features in relation to the number of events, a three-step sequential methodology was employed. Initially, the interobserver agreement of radiomic features was evaluated, and features with an ICC exceeding 0.75 were selected. Subsequently, features that demonstrated statistical significance in differentiating between SCLC and NSCLC groups were identified. Finally, the LASSO logistic regression was utilized to identify the most informative radiomic features for distinguishing between SCLC and NSCLC within the training cohort. This process incorporated fivefold cross-validation repeated 100 times to minimize the risk of overfitting.

Model Development

Machine learning methodologies were subsequently developed using variables that demonstrated statistical significance in the multivariate analysis. The study followed these procedural steps: (1) Patients were randomly assigned to training and testing cohorts in a 7:3 ratio; (2) Nine machine learning models were constructed using statistically significant factors identified in the training dataset, specifically: XGBoost (Extreme Gradient Boosting), SVM (Polynomial Support Vector Machine), LightGBM (Light Gradient Boosting Machine), AdaBoost (Adaptive Boosting), GNB (Gaussian Naive Bayes), MLP (Multilayer Perceptron), and KNN (k-Nearest Neighbor). Optimal parameters for these nine machine learning models were retrospectively identified using a 5-fold cross-validation approach. To address the risks of data imbalance and overfitting, the Synthetic Minority Over-sampling Technique (SMOTE) was employed, along with class-weight adjustments. A 5-fold cross-validation procedure was utilized to evaluate the performance of these models and determine the most efficient one. The primary evaluation metrics comprised AUC, accuracy, sensitivity, and specificity. Models exhibiting overfitting among the nine machine learning algorithms were excluded, and the most predictive classifier was selected based on ROC analysis. A predictive model was developed by integrating both clinical and CT data.

Subsequently, a statistical comparison was performed among three models: the clinical model, the radiomics model, and the combined model, which incorporates both clinical factors and radiomics features, to determine the model with the highest predictive accuracy.

Statistical Analysis

Statistical analyses were conducted using Python version 3.7, with patients randomly allocated to training and test groups in a 7:3 ratio. Radiomic features were normalized through Z-score transformation, and baseline data were analyzed using univariate methods via Python's statsmodels version 0.11.1. Categorical variables were evaluated using chi-square tests, while continuous variables were assessed using either t-tests or Mann-Whitney U tests. Variables exhibiting statistically significant differences (P < .05) were subsequently included in the multivariate logistic regression analysis. The multivariate analysis identified clinical and CT features with statistically significant differences (P < .05), which were then used to develop a clinical prediction model. The model's performance was assessed using key metrics, including AUC, accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. To evaluate the clinical utility of the three models, DCA and Kolmogorov-Smirnov statistical plots were employed. The methodology outlined in the Methods section ensures reproducibility by other researchers.

Results

Patient Characteristics

This retrospective study included a total of 132 patients, comprising 33 individuals diagnosed with stage III peripheral SCLC and 99 individuals diagnosed with stage III peripheral NSCLC. Participants were randomly assigned to two cohorts in a 7:3 ratio, resulting in 92 patients in the training cohort (26 with SCLC and 66 with NSCLC) and 40 patients in the test cohort (7 with SCLC and 33 with NSCLC). Statistical analyses indicated no significant age differences between the two groups, with both groups having a median age of 62 years. The proportion of male patients was higher in the SCLC cohort compared to the NSCLC cohort, with percentages of 70.00% and 66.00%, respectively (P = .670). Additionally, no statistically significant difference was observed in the prevalence of smoking history between the SCLC and NSCLC groups, with proportions of 55.00% and 51.00%, respectively (P = .688). However, a significantly higher prevalence of emphysema was identified in the SCLC group compared to the NSCLC group, with rates of 33.00% and 17.00%, respectively (P = .049).

According to CT findings, patients with NSCLC demonstrated a significantly higher incidence of the spiculated sign compared to those with SCLC, with prevalence rates of 52.5% and 3.03%, respectively (P < .001). Furthermore, pleural indentation was observed on CT lung window images in both NSCLC and SCLC groups, with occurrence rates of 64.65% and 6.06%, respectively (P < .001). Conversely, vermiform/branching signs were more prevalent in SCLC patients than in those with NSCLC, with frequencies of 24.24% and .10%, respectively. Additionally, a statistically significant difference was noted in swamp-like reinforcement during the venous phase between SCLC and NSCLC, with frequencies of 15.15% and 0.20%, respectively (P = .004).

In relation to tumor marker levels, the prevalence of abnormal proGRP and NSE was significantly higher in the SCLC cohort compared to the NSCLC cohort (P < .001 and P = .006, respectively). In contrast, the prevalence of abnormal CEA and SCC-Ag was significantly lower in the SCLC cohort than in the NSCLC cohort (P < 0.001 and P = .002, respectively).

The patient characteristics for both the training and testing cohorts are comprehensively presented in Table 1.

Table 1.

Clinical Characteristics of the Patients.

Characters		Training cohort			Testing cohort
Characters		NSCLC	SCLC	P	NSCLC	SCLC	P
Gender	Female	44	18	.813	21	5	.695
Male	22	8		12	2
Smoking	No	31	12	.944	18	3	.574
	Yes	35	14		15	4
Emp	No	53	17	.131	29	5	.268
	Yes	13	9		4	2
Necrosis	No	49	26	<.001	26	7	<.001
	Yes	17	0		7	0
Lobul	No	4	7	.005	2	2	.071
	Yes	62	19		31	5
Spicul	No	34	25	<.001	13	7	<.001
	Yes	32	1		20	0
Vermiform	No	65	21	<.001	33	4	<.001
	Yes	1	5		0	3
Swamp	No	65	23	.034	32	5	.020
	Yes	1	3		1	2
Cavity	No	45	26	<.001	25	7	<.001
	Yes	21	0		8	0
Lower lobe	No	36	10	.165	19	2	.163
	Yes	30	16		14	5
Ple	No	26	24	<.001	9	7	<.001
	Yes	40	2		24	0
proGRP median		46.565	148.300	<.001	52.657	341.900	.014
NSE median		14.692	16.230	.005	14.897	15.980	.379
CEA median		3.851	3.020	<.001	3.851	3.450	.229
SCC median		1.370	0.870	.005	1.410	0.800	.190
Age median		61.000	64.000	.208	62.455	57.286	.186

Abbreviations: NSCLC, non-small cell lung cancer; SCLC, small cell lung cancer; Lobul, lobulation; Spicul, spiculation; Cavity, cavities or vacuoles; Swamp, swamp-like reinforcement; Ple, pleural indentation sign; Vermiform, vermiform/branching; Emp, emphysema; proGRP, progastrin-releasing peptide; NSE, neuron-specific enolase; CEA, carcinoembryonic antigen;SCC, squamous cell carcinoma antigen.

Radiomics Feature Selection and Model Construction

Initially, a total of 1037 radiomic features were extracted. During the feature elimination process, 425 features that did not demonstrate significant differences between SCLC and NSCLC were removed, along with 255 features that exhibited high correlation and ICC values below 0.75. Subsequent LASSO screening, utilizing a λ value at the minimum standard with a standard error of 0.027 and 0.119, respectively, resulted in the retention of four robust radiomic features: energy, complexity, T90Percentile, and correlation, at λ = 0.119, as depicted in Figures 1A and 1B.

Figure 1.

Radiomics and Clinical Features Selection with LASSO. 1A and 1C: x-axis Represents log (λ), and the Numbers Above the x-axis Represent the Average Number of Predictive Variables. The red dot Represents the Average Deviation Value of Each Model with a Given λ, While the Vertical bar of the red dot Represents the Upper and Lower Limit Values of the Deviation. The Vertical Dotted Line Represents the log (λ) Value Corresponding to the Best λ Value; the Selection Standard is the Minimum Standard. By Adjusting Different Parameters (λ), the Binomial Deviation of the Model is Minimized, and the Feature Datasets With the Best Performance are Selected. 1B and 1D: Plots the Coeffificients of the log (λ) Function. The Dotted Line Represented the λ Value the Minimum Standard and the Smallest. Select the Coeffificient That is not 0 Here as the Coeffificient of the Last Reserved Feature. After Screening out the Redundant Features by LASSO, the Four Most Robust Radiomics Features (Energy, Complexity, t 90Percentile, Correlation) Were Retained, With λ = 0.119.

In the training dataset, nine machine learning radiomics prediction models were developed, including XGB, LGBM, RF, AdaBoost Classifier, GNB, LR, MLP, SVC, and K-Neighbors Classifier, using four radiomics features. Among these models, the Logistic Regression Classifier demonstrated superior performance, achieving AUC values of 0.841 on the training dataset and 0.847 on the validation dataset, as presented in Tables 2 and 3, and Figure 2. Furthermore, a variable number-model rating analysis was employed to optimize the model, resulting in the selection of complexity, T90Percentile, and correlation as key features. This optimized model achieved AUC values of 0.840 on the training dataset and 0.903 on the validation dataset, as illustrated in Figure 3. The KS statistical chart further validated the model's efficacy, with a KS value of 0.541 achieved under the optimal prediction probability threshold (Figure 4).

Figure 2.

Receiver Operating Characteristics (ROC) Curves of the Nine Machine Learning in Training Dataset.

Figure 3.

Variable Number-Model Rating Analysis was Used to Optimize the Model, and Complexity, T90Percentile, Correlation Were Finally Selected into the Model, and the AUC Values of 0.840 on the Training Dataset and 0.903 on the Validation Dataset. 3A Represent the Weight Value of the Feature in the Model. 3B Represent the Mean AUC Value of the Model with Different Features.

Figure 4.

The KS Statistical Chart for Radiomics Model (4A) and Clinical Model (4B).

Table 2.

Performance Metrics for Nine Models in the Training Dataset.

Model	AUC(SD)	Accuracy(SD)	Sensitivity(SD)	Specificity(SD)
XGBoost	1.000(0.000)	1.000(0.000)	1.000(0.000)	1.000(0.000)
logistic	0.841(0.011)	0.778(0.041)	0.806(0.070)	0.767(0.081)
LightGBM	0.915(0.019)	0.854(0.021)	0.901(0.046)	0.838(0.045)
RandomForest	1.000(0.000)	1.000(0.000)	1.000(0.000)	1.000(0.000)
AdaBoost	1.000(0.000)	1.000(0.000)	1.000(0.000)	1.000(0.000)
GNB	0.710(0.047)	0.698(0.041)	0.880(0.020)	0.615(0.052)
SVM	0.501(0.264)	0.505(0.185)	0.946(0.066)	0.331(0.278)
KNN	0.860(0.009)	0.775(0.058)	0.753(0.128)	0.775(0.136)
MLP	0.500(0.000)	0.702(0.021)	0.000(0.000)	1.000(0.000)

Abbreviations: AUC, area under the curve; XGBoost, EXtreme Gradient Boosting; SVM, polynomial supervised vector machine; LightGBM, Light Gradient Boosting Machine; AdaBoost, Adaptive boosting; GNB, Gaussian naive bayes; MLP, Multilayer Perceptron; KNN, k-Nearest Neighbor.

Table 3.

Performance Metrics for Nine Models in the Validation Dataset.

Model	AUC(SD)	Accuracy(SD)	Sensitivity(SD)	Specificity(SD)
XGBoost	0.749(0.067)	0.743(0.027)	0.579(0.145)	0.823(0.078)
logistic	0.847(0.015)	0.779(0.047)	0.783(0.178)	0.775(0.109)
LightGBM	0.795(0.057)	0.729(0.058)	0.624(0.222)	0.766(0.046)
RandomForest	0.778(0.094)	0.757(0.057)	0.476(0.236)	0.888(0.078)
AdaBoost	0.797(0.049)	0.750(0.064)	0.365(0.198)	0.913(0.031)
GNB	0.687(0.056)	0.629(0.058)	0.845(0.094)	0.570(0.094)
SVM	0.431(0.288)	0.464(0.156)	0.850(0.099)	0.300(0.257)
KNN	0.701(0.052)	0.621(0.112)	0.530(0.141)	0.651(0.155)
MLP	0.500(0.000)	0.743(0.047)	0.000(0.000)	1.000(0.000)

Feature Selection and Clinical Model Construction

In the training dataset, LASSO screening analysis identified five clinical features—namely, the spiculated sign, vermiform/branching sign, swamp-like reinforcement, pleural indentation sign, and proGRP—as key discriminators between NSCLC and SCLC at λ = 0.095, as depicted in Figures 1C and 1D. Utilizing these five features, a clinical prediction model was developed using LR machine learning. The AUC values for the clinical models in distinguishing SCLC from NSCLC were 0.775 in the training cohort and 0.864 in the test cohort. Additional details are presented in Figure 5. The model's efficacy was further corroborated by the KS statistical chart, which demonstrated a KS value of 0.697 at the optimal prediction probability threshold (Figure 4B).

Figure 5.

Comparison of Receiver Operating Characteristic (ROC) Curves Among the Clinical Model, Radiomics Model and Combined Model in the Training Cohorts (A) and Testing Cohorts (B). The AUC values in the combined model were better than those in the clinical model and radiomics model for the prediction of SCLC.

Combined Model Construction and Validation of Performance

In this study, a composite model was developed using a LR classifier, which integrated five clinical characteristics with three radiomic features. The performance of this composite model exceeded that of the individual clinical and radiomic models, as indicated by ROC-AUC values of 0.956 and 0.905 for the training and test cohorts, respectively (see Figure 5). The sensitivity and specificity were 0.961 versus 0.887 and 0.862 versus 0.831 in the training and test groups, respectively. Moreover, after employing the SMOTE to address potential issues of data imbalance and overfitting, the composite model exhibited high ROC-AUC values of 0.977 and 0.922 for the training and test cohorts (Figure 6). The sensitivity and specificity were 1.00 versus 0.972 and 0.894 versus 0.878 in the training and test groups, respectively. The model's robustness was further corroborated by the KS statistical chart, which demonstrated a KS value of 0.848 at the optimal prediction probability threshold (Figure 7). The results of the DCA indicated that the integrated model for preoperative differentiation between stage III peripheral SCLC and NSCLC significantly outperformed the clinical and radiomics models in terms of net benefits in both the training and test cohorts, as depicted in Figure 8. The Brier scores for the entire cohorts were 0.115, with the corresponding calibration plots provided. Finally, a nomogram was established to predict the risk of SCLC for each patient (Figure 9). The nomogram points and the risk were caculated based on the nomogram, and the best cut-off value of the risk to predict SCLC was 0.772.

Figure 6.

Receiver Operating Characteristic (ROC) Curves Among the Combined Model in the Training Cohorts (A) and Testing Cohorts (B), After that Resampling Technique (SMOTE) was Used to Mitigate the well-Known Risks of Data Imbalance or Overfitting.

Figure 7.

The KS Statistical Chart for Combined Model.

Figure 8.

Decision Curve Analyses for the Radiomics-Clinical Model Compared with the Radiomics Model and Clinical Model in the Training Cohort (A) and the Testing Cohort (B). Decision curve analysis showed that the net benefits of the combined model for the prediction of SCLC were higher than those of the clinical model and radiomics model.

Figure 9.

The Nomogram of the Combined Model.

The external validation cohort comprised 84 patients (21patients with SCLC, 63 patients with NSCLC) included in the final analysis. Table 4 presents the baseline characteristics of this external group. The Area Under the Curve (AUC) values for the combined model in predicting SCLC and NSCLC cases were 0.843 in the external validation cohort, The DCA revealed that when the probability of the threshold was over 0%, the net benefit of the combined model for the prediction of IMA was high (Figure 10A and 10B).

Figure 10.

Receiver Operating Characteristics (ROC) Curves(10A) and Decision Curve Analysis (10B) for Combined Model in the External Validation Cohort.

Table 4.

Clinical Characteristics of the Patients in External Validation Cohort.

Characters		External validation cohort
Characters		NSCLC	SCLC	P
Gender	Female	41	14	1.000
Male		22	7
Smoking	No	30	11	.900
Yes		33	10
Emp	No	52	9	.001
Yes		11	12
Necrosis	No	53	19	.721
Yes		10	2
Lobul	No	4	1	.674
Yes		62	20
Spicul	No	34	20	.002
Yes		29	1
Vermiform	No	59	8	<.001
Yes		4	13
Swamp	No	58	10	<.001
Yes		5	11
Cavity	No	551	21	.032
Yes		12	0
Lower lobe	No	33	8	.3378
Yes		30	13
Ple	No	28	20	<.001
Yes		35	1
proGRP median		52.625	181.950	.002
NSE median		14.897	18.420	.039
CEA median		3.841	2.500	.002
SCC median		1.200	0.6220	.003
Age median		62.000	64.000	.13

Discussion

This retrospective study identified significant differences in +both qualitative and quantitative clinical characteristics between patients with stage III peripheral SCLC and NSCLC. To distinguish stage III peripheral SCLC from NSCLC, five clinical variables were selected: spiculated sign, vermiform/branching sign, swamp-like reinforcement, pleural indentation sign, and proGRP. For the development of a radiomic prediction model, three radiomic features were chosen: complexity, T90th percentile, and correlation. The integrated clinical-radiomics model, which incorporated five clinical features alongside three radiomic parameters, demonstrated robust predictive capabilities in both the training and validation phases. Moreover, statistically significant distinctions were observed among the clinical model, the radiomics model, and the integrated clinical-radiomics model. Notably, the combined clinical-radiomics model exhibited superior performance compared to each individual model.

CT imaging of peripheral SCLC tumors consistently demonstrates homogeneous density, swamp-like enhancement, and a vermiform or branching pattern characterized by smooth margins, minimal spiculation, and pleural indentation. The vermiform or branching configuration, identified by a spindle-like morphology with its major axis oriented towards the hilum, functions as a polymer composed of two or more coalesced nodules and serves as a diagnostic marker for peripheral SCLC. Previous studies have described this vermiform or branching morphology using terms such as multinodular shape and fusiform beaded appearance.^17,18 The presence of the vermiform or branching sign may be associated with tumors that proliferate along bronchial or vascular walls with a short doubling time, leading to asymmetric tumor growth due to compression by surrounding tissues. As the tumor expands around the bronchioles, it forms multiple contiguous nodules, which may account for the multinodular nature of peripheral SCLC.

The swamp-like reinforcement pattern is characterized by a lack of substantial unenhanced regions. Macroscopically, this corresponds to minimal necrotic and stromal tissues interspersed among the tumor nests, despite the absence of extensive necrotic areas within the tumors. This pathological observation has been previously documented, with similar findings reported in alignment with the enhancement pattern described by Kazawa and Taiga.^17,19 A notable distinction exists between this pathological finding and that observed in NSCLC, which is characterized by larger necrotic tissue, and SCLC, which lacks necrotic tissue entirely. Consequently, the swamp-like reinforcement pattern of the nodule may serve as a critical diagnostic indicator for differentiating various types of malignant nodules.

In comparison to peripheral NSCLC, peripheral SCLC demonstrates a lower incidence of spiculation and pleural indentation signs, corroborating previous research findings.²⁰ In peripheral NSCLC, tumor cells possess the ability to invade the surrounding pulmonary parenchyma and thicken the interstitial space. This phenomenon is less pronounced in SCLC, likely attributable to its pathological characteristics, particularly the paucity of fibrous tissue. As a result, the impact on surrounding structures is minimal, leading to the infrequent occurrence of pleural indentation signs in SCLC. These observations are consistent with prior studies.²¹ Regarding tumor markers, the levels of pro-gastrin-releasing peptide (proGRP) in peripheral SCLC groups were significantly elevated compared to those in peripheral NSCLC groups, aligning with findings reported in previous research.²¹ Therefore, in instances where imaging signs are insufficient, tumor markers may prove beneficial for differentiation.

Previous studies have demonstrated that radiomic features derived from CT scans can preoperatively differentiate peripheral SCLC from NSCLC. Linning et al developed four radiomic classification models using these extracted features to evaluate phenotypic differences between SCLC and NSCLC, as well as among NSCLC subtypes, achieving an AUC of 0.82.²² Similarly, Chen et al constructed a CT radiomic model employing a neural network classifier to distinguish peripheral SCLC from NSCLC, resulting in an AUC of 0.93.¹⁵ Li et al developed a CT radiomic model using a feedforward neural network classifier to differentiate central SCLC from NSCLC, achieving an AUC of 0.78.²³ In our study, we employed a radiomics approach to enhance the differentiation of stage III peripheral SCLC from NSCLC. The images were transformed into higher-dimensional data to extract the relevant features. This method facilitated the high-throughput extraction of quantitative features from standard medical images, followed by automated analysis to support clinical decision-making. In this study, three independent radiomic features were incorporated into the final model: T90Percentile, complexity, and correlation. These features are classified under the Histogram Parameter, Neighboring Gray Tone Difference Matrix (NGTDM), and Texture Parameters, respectively. The histogram-derived parameter, T90Percentile, indicates that 90% of the observed values within a dataset fall below this specific threshold. The complexity feature, associated with the Neighboring Gray Tone Difference Matrix, relates to the complexity of an image and encapsulates the intricacy of the information it contains. Conversely, the correlation feature, categorized under Texture Parameters, represents the degree of similarity in gray levels between adjacent pixels. The differences in these three radiomic features between SCLC and NSCLC can be interpreted as follows: SCLC typically presents with uniform density and rapid growth, corresponding to a lower complexity value and a higher correlation value—metrics that reflect density and uniformity. In contrast, NSCLC is characterized by pronounced heterogeneity, often accompanied by necrotic vacuoles (areas of cell death). Specifically, this results in a notable increase in the T90 percentile and complexity, while the correlation diminishes. Utilizing the radiomic features, we developed a radiomic model that exhibits robust classification performance, achieving AUC values of 0.840 and 0.960 in the training and test cohorts, respectively. The integrated clinical-radiomics model demonstrated statistically significant superiority over both the individual clinical and radiomics models. And the integrated clinical-radiomics model also demonstrated a superior performance in external validation cohort with AUC values of 0.843. Furthermore, KS statistics and DCA substantiated the enhanced predictive capability of the combined model in comparison to the separate clinical and radiomics models. KS statistics and DCA offers valuable insights beyond traditional performance metrics such as discrimination and calibration, providing an evaluation of clinical impact that indicates an increased likelihood of successful outcomes.

The present study is subject to several limitations that warrant consideration. Firstly, due to the rarity of stage III peripheral SCLC, only two institutions were included in the study, and no power calculations were performed prior to the study. Secondly, the focus on patients with postoperative pathological results may introduce selection bias. Thirdly, the study exclusively assessed the primary tumor lesion without evaluating lymph node status. Another key limitation of this study is the absence of direct comparison or integration with clinical outcomes or prospective patient data. Future validation of this model will be conducted through a multicenter prospective study, followed by subsequent optimization.

Conclusion

Our research effectively established a model capable of differentiating between stage III peripheral SCLC and NSCLC by integrating preoperative clinicopathological, radiomic, and combined clinical-radiomic features. Moreover, the clinical-radiomic model exhibited robust predictive capabilities, indicating its potential utility in clinical settings. Within the context of contemporary pathological diagnostics, our methodologies represent the most precise approach for diagnosing stage III peripheral SCLC and NSCLC through postoperative tumor analysis. This model holds promise for substantially improving the accuracy of treatment planning for patients with advanced-stage cancers or those who may require neoadjuvant therapy in the future.

Footnotes

Abbreviations

ORCID iDs

Junjie Zhang

Ligang Hao

Ethics Approval and Consent to Participate

The Xing Tai People's Hospital ethical review board approved this retrospective analysis and waived informed consent requirements. Ethics Committee of Xing Tai People's Hospital, reference number: 2023 (124), dated 2023.11.15. The study was also approved by the Ethics Committee of the Fourth Hospital of Hebei Medical University, (reference number: 2022KS017, data 2022.6.27).

Consent for Publication

We confirm that there has been no publication, submission, or acceptance elsewhere of the manuscript other than this journal. All potentially identifiable images or data in this article were published with the written consent of the individuals involved.

Authors’ Contributions

JJ Z and LG H performed the experiments and wrote the manuscript. QX Z, LN Z, Q X and FX G was responsible for designing the experiments. All authors read and approved the final version of this submitted manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Key development plan of Xingtai (2023ZC049).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Availability of Data and Materials

The datasets produced and examined in the present investigation are not accessible to the public at this time due to ongoing analysis for future publications, although they can be obtained from the corresponding author on reasonable request.

References

Bray

Laversanne

Sung

, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–263.

Deng

, et al. Survival patterns for patients with resected N2 non-small cell lung cancer and postoperative radiotherapy: A prognostic scoring model and heat map approach. J Thorac Oncol. 2018;13(12):1968–1974.

, et al. Cancer gene profiling in non-small cell lung cancers reveals activating mutations in JAK2 and JAK3 with therapeutic implications. Genome Med. 2017;9(1):89.

Batenchuk

Badzio

, et al. PD-L1 Expression by two complementary diagnostic assays and mRNA in situ hybridization in small cell lung cancer. J Thorac Oncol. 2017;12(1):110–120.

Almquist

Mosalpuria

Ganti

. Multimodality therapy for limited-stage small-cell lung cancer. J Oncol Pract. 2016;12(2):111–117.

Hartshorn

Bradbury

Lanza

, et al. Nanotechnology strategies to advance outcomes in clinical cancer care. ACS Nano. 2018;12(1):24–43.

Cox

Yang

Speicher

, et al. The role of extent of surgical resection and lymph node assessment for clinical stage I pulmonary lepidic adenocarcinoma: An analysis of 1991 patients. J Thorac Oncol. 2017;12(4):689–696.

Yang

Chan

Shah

, et al. Long-term survival after surgery compared with concurrent chemoradiation for node-negative small cell lung cancer. Annal Surg. 2018;268(6):1105–1112.

Manoharan

Salem

Mistry

, et al. (18)F-fludeoxyglucose PET/CT in SCLC: Analysis of the CONVERT randomized controlled trial. J Thorac Oncol. 2019;14(7):1296–1305.

10.

Wilson

Devaraj

. Radiomics of pulmonary nodules and lung cancer. Transl Lung Cancer Res. 2017;6(1):86–91.

11.

Wen

Yang

Zhu

, et al. Pretreatment CT-based radiomics signature as a potential imaging biomarker for predicting the expression of PD-L1 and CD8 + TILs in ESCC. Onco Targets Ther. 2020;13:12003–12013.

12.

Lee

Park

, et al. Radiomics and its emerging role in lung cancer research, imaging biomarkers and clinical management: State of the art. Eur J Radiol. 2017;86:297–307.

13.

Woodruff

Sanduleanu

, et al. Preoperative CT-based radiomics combined with intraoperative frozen section is predictive of invasive adenocarcinoma in pulmonary nodules: A multicenter study. Eur Radiol. 2020;30(5):2680–2691.

14.

She

Zhang

Zhu

, et al. The predictive value of CT-based radiomics in differentiating indolent from invasive lung adenocarcinoma in patients with pulmonary nodules. Eur Radiol. 2018;28(12):5121–5128.

15.

Chen

, et al. Differentiating peripherally-located small cell lung cancer from non-small cell lung cancer using a CT radiomic approach. Front Oncol. 2020;10:593.

16.

Collins

Reitsma

Altman

Moons

. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ (Clinical Research ed). 2015;350:g7594.

17.

Kobayashi

Tanaka

Matsumoto

, et al. HRCT Findings of small cell lung cancer measuring 30 mm or less located in the peripheral lung. Jpn J Radiol. 2015;33(2):67–75.

18.

Sone

Nakayama

Honda

, et al. CT Findings of early-stage small cell lung cancer in a low-dose CT screening programme. Lung Cancer (Amsterdam, Netherlands). 2007;56(2):207–215.

19.

Kazawa

Kitaichi

Hiraoka

, et al. Small cell lung carcinoma: Eight types of extension and spread on computed tomography. J Comput Assist Tomogr. 2006;30(4):653–661.

20.

Ren

Cao

Wei

Shen

. Diagnostic accuracy of computed tomography imaging for the detection of differences between peripheral small cell lung cancer and peripheral non-small cell lung cancer. Int J Clin Oncol. 2017;22(5):865–871.

21.

Zhang

Lin

Chu

. Clinical and computed tomography characteristics for early diagnosis of peripheral small-cell lung cancer. Cancer Manag Res. 2022;14:589–601.

22.

Linning

Yang

Schwartz

Zhao

. Radiomics for classification of lung cancer histological subtypes based on nonenhanced computed tomography. Acad Radiol. 2019;26(9):1245–1252.

23.

Gao

, et al. Radiomics-Based features for prediction of histological subtypes in central lung cancer. Front Oncol. 2021;11:658887.

Development and Validation of Predictive Models for Differentiating Resectable Stage III Peripheral SCLC from NSCLC Using Radiomic Features and Clinical Parameters

Abstract

Objective

Method

Results

Conclusion

Keywords

Critical Relevance Statement

Key Points

Introduction

Materials and Methods

Patient Selection

CT Image Acquisition

Segmentation, Feature Extraction, and Selection

Model Development

Statistical Analysis

Results

Patient Characteristics

Radiomics Feature Selection and Model Construction

Feature Selection and Clinical Model Construction

Combined Model Construction and Validation of Performance

Discussion

Conclusion

Footnotes

Abbreviations

ORCID iDs

Ethics Approval and Consent to Participate

Consent for Publication

Authors’ Contributions

Funding

Declaration of Conflicting Interests

Availability of Data and Materials

References