Sage Journals: Discover world-class research

Abstract

Background:

The Ki-67 antigen, a marker of cell proliferation, serves as a biomarker for assessing tumor malignancy. However, measuring Ki-67 levels through immunohistochemistry is often challenging due to difficulties in specimen collection and individual health issues. Radiological analysis has emerged as a potential alternative for predicting Ki-67 levels, although its accuracy has been limited. This study aims to enhance the prediction of Ki-67 levels using chest X-rays by employing a refined approach that combines detailed, manually delineated radiological features with conventional imaging characteristics.

Methods:

This study collected X-ray images and Ki-67 expression data from 109 patients diagnosed with Non-Small Cell Lung Cancer (NSCLC). Seven radiological features related to tumor progression were annotated on each image by clinical professionals. Tumor areas were delineated using Python, resulting in the generation of 5 types of data from these regions. Data integration facilitated the development of predictive models utilizing Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Deep Neural Networks (DNN), with feature selection processes applied.

Results:

Using the RF, 8 predictive features were selected from the datasets, of which 7 exhibited a linear correlation with Ki-67 levels (Mantel-Haenszel test, P < .05). The model demonstrated robust performance metrics: Accuracy: 0.818, Precision: 0.823, Recall: 0.849, and F1 Score: 0.783.

Conclusions:

This research underscores the effectiveness of integrating specific radiological features, manually delineated regions of interest (ROIs), with traditional imaging characteristics and machine learning techniques. This approach significantly enhances the predictive accuracy of chest X-rays for Ki-67 levels, offering a non-invasive method for Ki-67 estimation.

Keywords

carcinoma non-small-cell lung machine learning neural networks computer deep learning forecasting Ki-67 antigen

Graphical abstract

Introduction

Ki-67 is a marker of cellular proliferation activity and holds significant value in the diagnosis and prognostic assessment of tumors.¹ High Ki-67 expression is often associated with increased tumor aggressiveness and a poorer prognosis,² making it widely utilized in the diagnosis and prognostic evaluation of lung cancer.

In clinical practice, factors such as the specific location of a patient’s tumor and their physical condition can hinder the successful acquisition of tumor tissue for pathological examination.^3
-6 Hence, scholars like Luo, Wang, et al, have investigated the use of imaging data and machine learning to predict Ki-67 levels.^7,8 However, previous studies on predicting Ki-67 in NSCLC have often been limited to binary classification models. While these models can achieve relatively high area under the curve (AUC) values, they offer limited clinical significance in evaluating the biological behavior of tumors. As a result, their application in clinical diagnosis and treatment remains challenging.

To address this issue, this article reviews X-ray imaging examinations to gather information about the location, size, shape, tissue structure, and biological behavior of masses within the body.^9,10 By manually processing patient imaging data and integrating it with machine learning models to predict Ki-67,^11,12 clinicians can more accurately and non-invasively assess the biological characteristics of lung cancer, thereby aiding in the development of personalized treatment strategies.

Methods

Please refer to Figure 1 for a comprehensive overview of the research process. This study has received approval from the Medical Ethics Committee of the Affiliated Cancer Hospital and Institute of Guangzhou Medical University, under project number 2024-ZN030. This article adheres to the TRIPOD 2015 statement¹³[Checklist-Chinese-1].

Figure 1.

Flowchart.

Data Collection and Organization

Dependent Variable Set

This study collected data from X-ray images and Ki-67 levels obtained through pathological immunohistochemistry for a total of 109 patients diagnosed with NSCLC at the Affiliated Tumor Hospital of Guangzhou Medical University over the past 10 years (from June 1, 2014, to June 1, 2024).

Inclusion criteria: (1) The patients have been definitively diagnosed with NSCLC through Computed Tomography-guided Needle Biopsy (CTNB) or Endobronchial Ultrasound-guided Transbronchial Needle Aspiration (EBUS-TBNA); (2) Lung mass on chest X-ray has not undergone relevant treatments prior to the X-ray examination; (3) Lesion diameter >1.5 cm; (4) Interval between the patient’s imaging examination and pathological examination is <2 weeks; (5) Comprehensive pathological information of the patient. Exclusion criteria: (1) ROI regions are blurred and difficult to delineate due to quality control issues in the digital X-ray imaging process ; (2)The grayscale display monitor cannot maintain a viewing brightness of 400 to 500 cd/m², has a resolution of fewer than 3 megapixels, or the graphics card outputs grayscale levels of less than 10 bits; (3) Tumors with significant internal necrosis, extensive hemorrhage, and absence of an appropriate target area for measurement. Please refer to Table 1 for the baseline of the patients. Unlike typical studies that categorize Ki-67 expression levels as high or low, this study divides Ki-67 expression levels into 4 grades based on the following criteria.¹⁴ The Ki-67 level of the included patients was assessed using a ready-to-use mouse monoclonal antibody, provided in liquid form and stored in a buffer containing 1% BSA stabilizer and 0.015% sodium azide, along with an SP immunohistochemistry detection kit. For result interpretation, 2 experienced pathologists independently examined the slides. Five random fields of view were selected under a high-power microscope, and 100 cells were counted in each field. Specific criteria are as follows: Any detectable Ki-67 nuclear staining (dot or diffuse) is considered a positive reaction, with 1% to 10% of positive expressing cells classified as weakly positive (+), 10% to 50% of positive expressing cells as positive (++), and greater than or equal to 50% of positive expressing cells as strong positive (+++).

Table 1.

Patient’s baseline characteristics.

Ki-67 level	Cases	Gender		Age (years, $\bar{x} \pm s$ )	Subtype
Ki-67 level	Cases	Male	Female	Age (years, $\bar{x} \pm s$ )	ADC^a	SCC^b
−	27	15	12	48.8 ± 14.4	27	0
+	29	17	12	48.2 ± 14.7	26	3
++	37	21	16	48.0 ± 14.7	28	9
+++	16	12	4	48.7 ± 14.6	11	5
Statistical value		$x^{2}$ = 1.896		F = 1.615	$x^{2}$ = 10.716
P value		0.594		0.190	0. 013

ADC: Adenocarcinoma

SCC: Squamous Cell Carcinoma.

Independent Variable Set

We selected 7 features that can be easily observed in X-ray images and are potentially correlated with tumor progression to some extent. In the following text, these features will be referred to as “lung features data.” features include vascular signs, pleural effusion, obstructive pneumonia, multiple lung nodules, homogeneity of lung mass density, lung mass density, and signs of infection. Among these features, vascular signs, multiple lung nodules, homogeneity of lung mass density, and signs of infection are binary in nature, indicating only the presence or absence of the feature, represented by 1 or 0, respectively. Below are the definitions or criteria for each feature.^15
-17

Vascular signs: The vascular bundle sign appears on the image as 1 or several blood vessels reaching into or intersecting with the tumor or passing through the tumor’s edge.

Multiple lung nodules: Defined as the presence of 2 or more lesions within the lung.

Density variation: A lung mass with varying intensity is categorized as 1, while a mass with homogeneous density is categorized as 0.

Signs of infection: Refer to the presence of unclear shadows around the lung parenchyma, thickening of lung texture, and darkening of lung fields.

Three features will be classified as ordinal variables: pleural effusion, obstructive pneumonia, and lung density occupancy. The criteria for their classification are as follows:

Pleural effusion: Divided into 4 levels: none, slight, moderate, and heavy. This classification is based on the “Chinese Expert Consensus on the Diagnosis of Pleural Effusion” and is supplemented by the clinical experience of medical professionals.

Obstructive pneumonia: Divided into 3 levels based on clinical experience: (1) none, (2) not affecting more than half of the lung area, and (3) affecting more than half of the lung area.

Lung density occupancy uniformity: Divided into 3 levels based on clinical experience: low density, medium density, and high density. Low density is defined as being greater than lung tissue density, while high density is characterized as being less than bone tissue density, with medium density falling in between these 2 extremes.

Feature Values: These 3 features are assigned integer values starting from 0, based on their levels in sequential order.

To enhance the predictive capabilities of the model, a computer programing language was employed in this study to develop a program for outlining the ROI in an interactive human-computer interface.¹⁸ Through computation, 5 shape features related to lung occupancy were extracted. The significance and extraction methods of these features are detailed below.

An interactive image drawing program was implemented, enabling clinicians to manually outline the ROI on locally stored images using a mouse. Users can click and drag the mouse on the image to delineate the desired area, with the program displaying the outlined region in real-time. This approach not only provides the most intuitive response from clinicians regarding lesion and affected area recognition but also offers precise boundary information for subsequent data analysis and processing. As illustrated in Figure 2, the high-density white lines represent the manually drawn ROI. Upon completion of the outlining process, mathematical calculations are employed to analyze the data presented in Table 2 (ROI information from Figure 2, results rounded to 2 decimal places).

Figure 2.

(a) Drawing ROI on an X-ray film. (b) Schematic diagram of each feature in ROI data.

Table 2.

ROI data.

ROI area^a	678.95 cm²^b
Minimum bounding circle area of ROI^c	1300.96 cm²
ROI Longest diameter/shortest diameter^d	2.16
Minimum bounding circle area of ROI / ROI area	1.92
Standard deviation^e	2.68 cm
Variance^f	7.23 cm²

The area delineated by the clinical physician for the lung mass region in the X-ray imaging direction (AP or PA view). All variables refer to Figure 2b.

The pixel in Figure 2a corresponds to 0.143 cm in the real world.

Refers to the area of the smallest circle that can exactly surround the ROI.

By mathematical calculation, the geometric center of the ROI is obtained, and a line passing through this geometric center with 2 intersections with the ROI edge is calculated to find the ratio of the longest to the shortest line segment.

Starting from the geometric center of the ROI and emitting line segments toward the shape edge (composed of several dense points), a set of line segment lengths is obtained, and the standard deviation of this set is calculated.

Due to the fact that the ratio of the minimum bounding circle area of ROI to the ROI area is derived from both measures, and since the aim of this study is to explore the relationship between the 2, we have chosen to exclude the minimum bounding circle area of the ROI to mitigate issues related to multicollinearity. The remaining 5 features will be referred to as “ROI data.”

In summary, a total of 12 features have been identified, which can be categorized into 2 groups. One group consists of 7 features obtained from clinical image analysis, while the other group includes 5 features derived from manually delineated ROI calculations. Since these 5 features are continuous variables, we will categorize them into ordinal variables ranging from levels 1 to 4 based on their quartiles in ascending order, thereby converting them into ordered variables.

Modeling and Analysis

To explore models that are more suitable for the dataset at hand, this study employed LR, SVM, RF, and DNN for model construction.^19,20 The random state for the train/test split during model training is set to 6, with the test set ratio established at 30%.

The remaining data were utilized for fivefold cross-validation, with the mean performance metric derived from this process serving to evaluate the comparative effectiveness of different models. Throughout the study, the test set remained entirely isolated from all data preprocessing and cross-validation procedures to ensure unbiased evaluation.

We selected 4 metrics—Accuracy, Precision, Recall, and F1-score—to comprehensively evaluate the models’ performance, and we presented the results using a confusion matrix, all formatted to 3 decimal places for accuracy.

This study utilized 4 distinct machine learning and neural network algorithms, combined with 3 feature categories—specifically, “Lung Features Data,” “ROI Data” and their combined form “Lung Features + ROI Data” in a total of 12 unique computational models for comparative analysis. The hyperparameter settings for the various models are detailed in Appendix A, Table A1.

In Table 3, the performance of each machine learning method is compared using 3 different categories of features. It is evident that, across all machine learning/neural network approaches, the combination of “lung features data + ROI data” yields superior predictive performance compared to using either of the 2 individual feature categories alone. When the combined features “lung features data + ROI data” are employed, the Random Forest model outperforms the other 3 models, as shown in Table 4.

Table 3.

Model performance comparison.

Models	Accuracy	Precision	Recall	F1-score
LR
Lung features data	0.455	0.500	0.457	0.340
ROI data	0.576	0.519	0.536	0.517
Lung features data + ROI data	0.606	0.546	0.588	0.541
SVM	Accuracy	Precision	Recall	F1-score
Lung features data	0.515	0.504	0.494	0.440
ROI data	0.636	0.680	0.608	0.592
Lung features data + ROI data	0.606	0.571	0.586	0.555
RF	Accuracy	Precision	Recall	F1-score
Lung features data	0.515	0.587	0.559	0.522
ROI data	0.545	0.660	0.547	0.529
Lung features data + ROI data	0.758	0.786	0.780	0.725
DNN
Lung features data	0.394	0.221	0.393	0.262
ROI data	0.455	0.469	0.433	0.311
Lung features data + ROI data	0.576	0.469	0.478	0.424

Table 4.

Performance comparison of different models.

Model	Accuracy	Precision	Recall	F1-score
LR	0.606	0.546	0.588	0.541
SVM	0.606	0.571	0.586	0.555
DNN	0.576	0.469	0.478	0.424
RF	0.758	0.786	0.780	0.709

All model development and statistical analyses were conducted using Python (version 3.9). The Mann-Whitney U test was employed to compare continuous variables with non-normal distributions, while the independent samples t-test was utilized for continuous variables with normal distributions. The chi-squared test was applied to categorical variables. Performance comparisons among different models were performed using fivefold cross-validation. A P-value < .05 was considered indicative of a significant difference between the datasets being compared.

Model Optimization

This study includes 12 features. After comparing the performance of various models, the Random Forest algorithm was ultimately selected to construct the predictive model using the combined feature set of “Lung features + ROI data.”

Given that some imaging features may exhibit significant multicollinearity, we plotted a collinearity heatmap (Figure 3). The heatmap revealed that certain features are highly correlated, such as “standard deviation” with “diameter ratio” and “area ratio” where the correlation coefficient is greater than or equal to .9. To select features that contribute more significantly to model interpretation and to reduce prediction errors caused by multicollinear features, we plotted a feature importance score chart (Figure 4). Keeping the lung features unchanged, we systematically removed different ROI features in increasing order of feature importance. After conducting tests and comparing the results, we retained the top 4 key ROI features. Subsequently, by fixing the ROI features, we removed lung features with lower importance in the same order. Since the number of features is small, we applied an exhaustive search to select features, halting the elimination process when the model’s F1-score decreased. The features retained were those identified before the drop in the F1-score.

Figure 3.

Correlation heatmap: (a) RF model based on 12 features, (b)RF model based on 8 features after feature selection.

Figure 4.

Confusion matrix and feature importance ranking graphs (axes labeled 0-3 representing Ki-67 levels :- ~+++): (a) RF model based on 12 features, (b) RF model based on 8 features after feature selection.

Here, the mean F1-score obtained through cross-validation on the training set was utilized to evaluate changes in model performance. Although there was no significant difference in the F1-score before and after excluding certain features, we decided to retain or remove features based on the observed trends in the mean values. The model’s performance was validated using an independent test set.

Ultimately, 8 features that significantly contributed to the prediction results were retained. Finally, we will compare the performance of the optimized model with that of the original model, as shown in Table 5 and Figure 4.

Table 5.

Performance comparison between new and original RF models.

Model	Accuracy	Precision	Recall	F1-score (95% CI)	F1-score (test)
RF_original	0.758	0.786	0.780	0.709 (0.496-0.873)	0.725
RF_new	0.818	0.823	0.849	0.783 (0.628-0.918)P = .046 < .05	0.793

Feature Contribution Analysis

After training the data using a random forest model, we employed SHapley Additive exPlanations plots (SHAP) to understand which features contribute most to the model’s interpretability for different levels of Ki-67 expression, as illustrated in Figure 5. We observed that in the SHAP plots for predicting all levels of Ki-67, the top 3 features, based on SHAP values, include imaging features extracted from the ROI, constituting 7/12 (58.3%) of them. Specifically, the ROI area was identified as the most significant feature contributing to model interpretability when Ki-67 expression levels were classified as negative (−), positive (+), or strongly positive (+++).

Figure 5.

(a-d) correspond to the SHAP plots of various features when Ki-67 expression levels are −, +, ++, and +++.

Results

Through the comparison of models and feature selection, this study ultimately identified 8 features from 2 types of data to predict Ki-67 levels. The results of the Mantel-Haenszel chi-square test indicated that Ki-67 levels were linearly related to the X-ray lung mass area (χ² = 37.856, P < .001), the standard deviation of the distance between the mass boundary and the center (χ² = 12.279, P < .001), the variance of the distance between the mass boundary and the center (χ² = 39.081, P < .001), lung mass density (χ² = 17.071, P < .001), the presence of multiple lung nodules (χ² = 14.746, P < .001), vascular signs (χ² = 8.269, P = .004), and pleural effusion (χ² = 8.100, P = .004), all of which demonstrated a linear relationship. Conversely, no linear relationship was observed between Ki-67 levels and signs of infection (χ² = 0.683, P = .409). The statistical results are presented in descending order of importance.²¹ Following the analysis using SHAP plots, the results suggest that manually delineated ROIs compensate for limitations in conventional imaging features, as they play a significant role in elucidating the decision-making process of the model. The integration of these 2 types of features substantially enhances the model’s capability for structured classification prediction.

By comparing the performance of 4 models (Accuracy, Precision, Recall, F1-score) under fused features, the results are as follows: LR:(0.606, 0.546, 0.588, 0.541); SVM:(0.606, 0.571, 0.586, 0.555); DNN:(0.576, 0.469, 0.478, 0.424); RF:( 0.758, 0.786, 0.780, 0.725). RF was selected for the final model construction. Subsequently, we removed features from the “lung features data + ROI data” that interfered with the model’s predictions. Using cross-validation, the original RF model retaining all features in the “lung features data + ROI data” achieved an F1-score of 0.709 (95% CI: 0.496-0.873), while the feature-selected RF model achieved an F1-score of 0.783 (95% CI: 0.628-0.918). The difference in F1-scores between the 2 models was statistically significant (t = −2.357, P = .046 < .05). The performance of the optimal RF model was as follows: Accuracy: 0.818, Precision: 0.823, Recall: 0.849, F1-score: 0.783. The F1-score on the test set was 0.793.

Discussion

This study innovatively applies a method similar to “target area delineation” to feature extraction. The method of manually delineating lung mass ROIs offers higher objectivity and precision compared to the traditional method of measuring length and width distances, partially compensating for information loss in manual image reading. Based on the SHAP plots, we observed that among the 8 selected features, those derived from manually delineated ROIs constitute nearly half of all features and rank highly in terms of SHAP value. This indicates that these data significantly contribute to the model’s interpretability.^22,23

Regarding the 2 categories of selected features, pleural effusion—a traditional clinical pulmonary characteristic—serves as a significant predictor of the Ki-67 level. This aspect has been thoroughly examined in the Chinese expert consensus on treatment of malignant pleural effusio.²⁴ In this consensus, Yao et al consistently assert that pleural involvement is a crucial factor in predicting the Ki-67 level.^25,26 Concerning Region of Interest (ROI) features, previous studies by Sun, Fu, et al utilizing radiomics on CT scans have revealed a relationship between morphological features and the Ki-67 level. However, because nomograms were employed, all imaging features were aggregated into a single numerical value (threshold), which limited the exploration of the specific relationships between individual morphological features and the Ki-67 level. In the present study, X-ray imaging features were utilized, with 3 features incorporated into the final predictive model, which ranked highly in terms of importance: “ROI area” standard deviation and the “ROI area” feature, it is evident that tumors with faster proliferation rates are more likely to progress locally in a short period, rather than growing slowly. This observation aligns with the principle that higher tumor malignancy correlates with increased peripheral invasion. The remaining 2 features highlight the irregularity of lung tumor shapes. This measurement approach takes into account several factors that influence tumor proliferation, including local biological heterogeneity, the uneven distribution of surrounding bronchi and blood vessels, which leads to variations in oxygen and nutrient supply, and differences in immune microenvironments caused by the irregular distribution of lymph nodes and immune cells. These and other potential factors ultimately result in varying proliferation rates across different regions of a uniform tumor, manifesting as morphological diversity. The rapid proliferation of specific tumor regions leads to overall irregularity in tumor shape, making “standard deviation” and “variance” essential features for predicting Ki-67 level. This finding provides strong evidence for the need to further standardize and universally define these features, as well as to explore additional radiomic characteristics in future research.

Although traditional length and width measurements are straightforward and practical, they often suffer from operator subjectivity, which challenges the reliability and consistency of the results. In terms of performance across various models, the manual delineation of lung mass ROI delineation has improved the ability to predict Ki-67 levels based on imaging features. Both precision and recall rates have achieved relatively high levels, significantly reducing the risks associated with diagnostic errors.

In this study, we innovatively categorized Ki-67 expression levels into 4 distinct grades, moving away from the traditional binary classification approach, such as the 5% threshold proposed by Huang et al to differentiate between invasive and non-invasive lung adenocarcinomas.²⁷ This method allows for a more precise assessment of tumor progression in patients and facilitates comparative analysis. Typically, the outcomes of binary models can fluctuate based on the established threshold for Ki-67. It is challenging to justify placing tumors with Ki-67 levels of 1% and 20% in the same category if the threshold is set at 20%. Such predictions do not positively contribute to clinical practice. To address this issue and effectively harness the predictive potential of Ki-67 levels, this study accepts a slight decrease in model performance compared to binary models. However, due to the introduction of new predictive indicators, the comparison of multiple models, and variable screening, we do not compromise the overall performance of the model.

Of course, the reliability of this model largely depends on the imaging expertise of those delineating the ROIs.²⁸ Poor ROI delineation practices include drawing boundaries of lung masses that are significantly larger or smaller than the actual size, marking areas of inflammation, effusion, or other non-mass regions as ROIs, and failing to adhere strictly to the established boundaries during the delineation process. Inaccurate ROI delineation can lead to misestimation of the tumor’s Ki-67 level, potentially causing clinicians to misinterpret the patient’s condition. Therefore, the study recommends that at least 2 qualified clinicians delineate the ROIs. After eliminating possible drawing errors, the average of the obtained results should be considered the true value to facilitate accurate estimation of Ki-67 levels.

Overall, utilizing radiomics to predict Ki-67 expression is a vital non-invasive method for assessing Ki-67 levels in patients. However, it encounters challenges related to low accuracy and coarse classification, which impede its clinical application. To address these issues, this study combines manually delineated tumor ROI with traditional radiological features for feature extraction. By employing various machine learning models and integrating feature selection, we enhance the precision of predicting the multi-class expression of Ki-67. This approach has significantly facilitated the incorporation of radiomics into clinical diagnostics.

Conclusions

Manual delineation of NSCLC ROIs provides greater objectivity and precision compared to traditional length and width measurements, which are susceptible to personal biases that can compromise reliability. By integrating clinical imaging features with mathematical characteristics derived from manually delineated ROIs in radiomics, the loss of information due to human interpretation is minimized. This integration allows for the objective quantification of tumor characteristics, thereby enhancing assessment accuracy. This study employed manual ROI delineation to extract features of NSCLC, combined with traditional imaging attributes, and applied machine learning techniques to improve the prediction of Ki-67 expression levels in lung cancer. This approach assists physicians in accurately assessing conditions and prognoses, reduces financial burdens on patients, and minimizes radiation exposure. It holds significant clinical potential as a diagnostic and prognostic tool for evaluating the proliferation rate of NSCLC.

Footnotes

Appendix A

Table A1.

Hyperparameter settings for different models.

LR
multi_class	Multinomial
Solver	Lbfgs^a
class_weight	Balanced
max_iter	1000
SVM
C	1.0
Kernel	Linear
Degree	3
Gamma	Scale
class_weight	Balanced
decision_function_shape	Ovr
RF
n_estimators	10
Criterion	Gini^b
max_depth	None
min_samples_split	2
min_samples_leaf	1
Bootstrap	True
class_weight	None
n_jobs	-1
DNN
Layer (type)	Output Shape (None, 4) (None, 64) (None, 64) (None, 4)
dense_8 (Dense)
dense_9 (Dense)
dense_10 (Dense)
dense_11 (Dense)
Optimizer	SGD(learning_rate = 0.1)
Loss	categorical_crossentropy

Optimize using the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm.²⁹

Usually better than entropy-based information gain.³⁰

Acknowledgements

Thanks to Hongjiang Liu for his assistance with software installation and debugging for this paper. We extend our heartfelt gratitude to Tina Zhang for her invaluable suggestions regarding the writing of this article. We would like to express our gratitude to Associate Chief Physician Fengmei Qiu for her invaluable assistance in interpreting the standards for the grayscale display monitor used in this study.

ORCID iDs

Yizhi Li

Jia Zhang

Xiaodan Lin

Ethical Considerations

This study has received approval from the Medical Ethics Committee of the Affiliated Cancer Hospital and Institute of Guangzhou Medical University, under project number 2024-ZN030.

Consent to Participate

Written informed consent was obtained for each participant according to ICMJE guidelines.

Author Contributions

Yizhi Li: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing - Original Draft, Writing - Review & Editing, Visualization.

Jia Zhang: Data Curation, Writing - Original Draft.

Xiaodan Lin: Writing - Original Draft, Writing - Review & Editing, Supervision, Project administration.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Folescu

Levai

Grigoraş

, et al. Expression and significance of Ki-67 in lung cancer. Rom J Morphol Embryol. 2018;59(1):227-233.

Pan

, et al. Tumor cell proliferation (Ki-67) expression and its prognostic significance in histological subtypes of lung adenocarcinoma. Lung Cancer. 2021;154:69-75.

Eapen

Shah

Lei

, et al. Complications, consequences, and practice patterns of endobronchial ultrasound-guided transbronchial needle aspiration: results of the AQuIRE registry. Chest. 2013;143(4):1044-1053.

Ialongo

Ciarpaglini

Tinti

Suadoni

Cardillo

Systemic air embolism as a complication of percutaneous computed tomography guided transthoracic lung biopsy. Ann R Coll Surg Engl. 2017;99(6):e174-e176.

Smit

Kleijn

de Voogt

WG.

Coronary and cerebral air embolism: a rare complication of computed tomography-guided transthoracic lung biopsy. Neth Heart J. 2013;21(10): 464-466.

Tavare

Patel

Saini

Creer

Hare

SS.

Systemic air embolism as a complication of percutaneous lung biopsy. Br J Hosp Med. 2018;79(2):106-107.

Luo

Zheng

Zhang

, et al. CT-based radiomics for predicting Ki-67 expression in lung cancer: a systematic review and meta-analysis. Front Oncol. 2024;14:1329801.

Wang

Liang

Cui

LL.

Correlation of FDG PET/CT, tumor markers and Ki-67 index with EGFR mutation or positive ALK expression in patients with non-small cell lung cancer. Q J Nucl Med Mol Imaging. 2024;68:169-175.

Grala

Markiewicz

Kozłowski

, et al. New automated image analysis method for the assessment of Ki-67 labeling index in meningiomas. Folia Histochem Cytobiol. 2009;47(4):587-592.

10.

Syed

Zoga

AC.

Artificial intelligence in radiology: current technology and future directions. Semin Musculoskelet Radiol. 2018;22(5):540-545.

11.

Gore

JC.

Artificial intelligence in medical imaging. Magn Reson Imaging. 2020;68:A1-A4.

12.

Currie

Hawk

Rohren

Vial

Klein

Machine learning and deep learning in medical imaging: intelligent imaging. J Med Imaging Radiat Sci. 2019;50(4):477-487.

13.

Moons

Altman

Reitsma

, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1-73.

14.

Scholzen

Gerdes

The Ki-67 protein: from the known and the unknown. J Cell Physiol. 2000;182(3):311-322.

15.

MacMahon

Naidich

Goo

, et al. Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner society 2017. Radiology. 2017;284(1):228-243.

16.

Gould

Donington

Lynch

, et al. Evaluation of individuals with pulmonary nodules: when is it lung cancer? Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest. 2013;143(5 Suppl):e93S-e120S.

17.

Bai

Choi

Chu

, et al. Evaluation of pulmonary nodules: clinical practice consensus guidelines for Asia. Chest. 2016;150(4):877-893.

18.

Tanaka

[[Fundamentals] 5. Python+scikit-learn for machine learning in medical imaging]. Nippon Hoshasen Gijutsu Gakkai Zasshi. 2023;79(10):1189-1193.

19.

Steyerberg

Eijkemans

Harrell Fe

Habbema

JD.

Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000;19(8):1059-1079.

20.

Avanzo

Wei

Stancanello

, et al. Machine and deep learning methods for radiomics. Med Phys. 2020;47(5): e185-e202.

21.

Woolson

Bean

JA.

Mantel-Haenszel statistics and direct standardization. Stat Med. 1982;1(1):37-39.

22.

Bifarin

Fernandez

FM.

Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics. bioRxiv, 2023.

23.

Zhang

Shi

Yin

, et al. A machine learning model based on ultrasound image features to assess the risk of sentinel lymph node metastasis in breast cancer patients: applications of scikit-learn and SHAP. Front Oncol. 2022;12:944569.

24.

Gregorutti

Michel

Saint-Pierre

Correlation and variable importance in random forests. Stat Comput. 2017;27(3):659-678.

25.

Yao

Liao

, et al. Noninvasive method for predicting the expression of ki67 and prognosis in non-small-cell lung cancer patients: radiomics. J Healthc Eng. 2022;2022: 7761589.

26.

Ahn

Jung

, et al. Clinical significance of Ki-67 and p53 expression in curatively resected non-small cell lung cancer. Tumour Biol. 2014;35(6):5735-5740.

27.

Huang

Lyu

, et al. Pre-operative prediction of Ki-67 expression in various histological subtypes of lung adenocarcinoma based on CT radiomic features. Front Surg. 2021;8:736737.

28.

van Timmeren

Cester

Tanadini-Lang

Alkadhi

Baessler

Radiomics in medical imaging-“how-to” guide and critical reflection. Insights Imaging. 2020;11(1):91.

29.

Chang

Sun

Zhang

An accelerated linearly convergent stochastic L-BFGS algorithm. IEEE Trans Neural Netw Learn Syst. 2019;30(11):3338-3346.

30.

Menze

Kelm

Masuch

, et al. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics. 2009;10:213.

Manual Delineation of the Region of Interest Combined With Clinical Image Analysis to Predict the Ki-67 Expression Level in Non-small Cell Lung Cancer

Abstract

Background:

Methods:

Results:

Conclusions:

Keywords

Graphical abstract

Introduction

Methods

Data Collection and Organization

Dependent Variable Set

Independent Variable Set

Modeling and Analysis

Model Optimization

Feature Contribution Analysis

Results

Discussion

Conclusions

Footnotes

Appendix A

Acknowledgements

ORCID iDs

Ethical Considerations

Consent to Participate

Author Contributions

Funding

Declaration of Conflicting Interests

References