Sage Journals: Discover world-class research

Abstract

Objectives

This two-center study aimed to establish a model for predicting the risk of lymph node metastasis in gastric cancer patients using machine learning (ML) and logistic regression (LR) algorithms, and to evaluate its predictive performance in clinical practice.

Methods

Data of a total of 369 patients who underwent radical gastrectomy in the Department of General Surgery of Affiliated Hospital of Xuzhou Medical University (Xuzhou, China) from March 2016 to November 2019 were collected and retrospectively analyzed as the training group. In addition, data of 123 patients who underwent radical gastrectomy in the Department of General Surgery of Jining First People's Hospital (Jining, China) were collected and analyzed as the verification group. Besides, 7 ML and logistic models were developed, including decision tree, random forest, support vector machine (SVM), gradient boosting machine (GBM), naive Bayes, neural network, and LR, in order to evaluate the occurrence of lymph node metastasis in patients with gastric cancer. The ML model was established following 10 cross-validation iterations within the training dataset, and subsequently, each model was assessed using the test dataset. The model's performance was evaluated by comparing the area under the receiver operating characteristic curve of each model.

Results

Compared with the traditional logistic model, among the 7 ML algorithms, except for SVM, the other models exhibited higher accuracy and reliability, and the influences of various risk factors on the model were more intuitive.

Conclusion

For the prediction of lymph node metastasis in gastric cancer patients, the ML algorithm outperformed traditional LR, and the GBM algorithm exhibited the most robust predictive capability.

Keywords

machine learning prediction model gastric cancer lymph node metastasis

Introduction

Gastric cancer is one of the most common malignant tumors in the digestive system, ranking sixth in the world in incidence and fourth in mortality.¹ At present, gastric cancer typically is managed with comprehensive treatment that includes surgery. However, the overall 5-year survival rate remains below 50%.² Lymph node metastasis is reported in 3%-20% of patients with early-stage gastric cancer. Therefore, lymph node dissection plays a crucial role in determining the most suitable treatment approach. The stage of lymph node dissection is also a significant evaluation criterion in gastric cancer treatment.^3,4 In recent years, as the role of adjuvant chemotherapy in preventing recurrence has become increasingly evident, accurately determining the extent of lymph node metastasis following gastric cancer surgery is of paramount importance. This determination guides the selection of appropriate adjuvant therapy and ultimately contributes to improving the patient's prognosis. In the TNM staging system of the American Joint Committee on Cancer (AJCC), N represents the number of lymph node metastases, which is itself an independent factor in predicting the overall survival rate of gastric cancer patients.^5–7 However, there are some difficulties in the exploration of lymph nodes in patients with gastric cancer, such as multiple regional lymph nodes located in the abdominal cavity, which is not easy to explore preoperatively. In addition, 42.5% of metastatic lymph nodes in gastric cancer belong to nodule type and peripheral type, restricting the application of imaging diagnosis.^8,9 Artificial intelligence refers to the ability of machines to independently replicate typical human intellectual processes.¹⁰ Artificial intelligence has various applications in the medical field, encompassing image processing, computer vision, machine learning (ML), artificial neural networks (NNET), convolutional NNETs, and deep learning.^11,12 Among them, ML algorithm plays crucial roles in assisting diagnosis and predicting prognosis by processing a large amount of complex medical data.^13,14 LUO, MA et al have applied ML and deep learning to esophageal and gastric varices in patients with cirrhosis.^15,16 The clinical utility of ML within the realm of artificial intelligence is increasingly attracting clinicians’ attention, and it is also applied to clinical models of various diseases, including stomach cancer. The present study aimed to explore the differences between the clinical model established by the ML algorithm and the traditional logistic regression (LR) model in predicting lymph node metastasis in patients with gastric cancer.

Materials and Methods

A total of 369 patients who underwent radical gastrectomy in the Department of General Surgery of the Affiliated Hospital of Xuzhou Medical University (Xuzhou, China) from March 2016 to November 2019 were enrolled as the training group, and 123 patients who underwent radical gastrectomy in the Department of General Surgery of Jining First People's Hospital (Jining, China) were enrolled as the verification group (Figure 1). The inclusion criteria were summarized as follows: (a) patients with primary gastric cancer who were initially diagnosed in two hospitals and were treated with concurrent surgery; (b) confirmation of lymph node metastasis by imaging and pathology; (c) obtaining the maximum tumor diameter, nerve or vascular invasion, and the depth of tumor invasion from the postoperative pathology. Other indicators were achieved from medical records and preoperative blood tests; (d) anti-tumor therapies, such as radiotherapy and chemotherapy were not performed preoperatively; and (e) the existence of complete clinical data. The exclusion criteria were summarized as follows: (a) Combination with other malignant tumors; (b) the presence of other infectious diseases, blood system diseases, autoimmune diseases, and other disease complications that might affect inflammatory indicators preoperatively; (c) currently receiving or having history of recently receiving anti-inflammatory or immunosuppressive therapy; (d) Preoperative transfusion treatment; (e) severe liver and kidney insufficiency; and (f) Incomplete clinical data. The reporting of this study conforms to TRIPOD guidelines.¹⁷

Figure 1.

The study flowchart.

Observational Indicators

Clinical data, such as name, age, gender, and other clinicopathological data, including blood routine parameters, tumor location, maximum tumor diameter, depth of invasion, and the presence or absence of lymph node metastasis, were collected from all patients. These data were gathered in the morning on an empty stomach on the day after admission using the Sysmex XE-2100 Automatic blood analyzer and associated reagents to determine neutrophil count, platelet count, monocyte count, lymphocyte count, and carcinoembryonic antigen (CEA) level in the blood. The pan-immune-inflammation value (PIV) and CEA level were also utilized to establish clinical prediction models. PIV was calculated as follows: (neutrophil count × platelet count × monocyte count)/lymphocyte count.

Statistical Analysis

Continuous variables were expressed as mean ± standard deviation, and categorical variables were presented as ratio. LR was employed to analyze the independent risk factors associated with lymph node metastasis in gastric cancer patients. This analysis allowed for the calculation of odds ratios (ORs) and their corresponding 95% confidence intervals (CIs). An OR greater than 1 indicated that the variable was a positive risk factor affecting the outcome, while an OR less than 1 suggested that the variable was a negative risk factor influencing the outcome. Statistical significance was defined as a P-value of less than 0.05. The statistical analysis and modeling procedures were carried out using SPSS 20.0 software (IBM, Armonk, NY, USA) and R-Studio 25.0 software (R Foundation for Statistical Computing, Vienna, Austria). Several packages were utilized to train models and draw relevant graphs, with the caret packages applied for training and validating ML models. In addition to the fundamental linear model (linear LR), 7 ML models were fitted, including LR, random forest (RF), gradient boosting machine (GBM), decision tree (DT), support vector machine (SVM), naive Bayes (NB), and NNET, as illustrated in Figure 2.

Figure 2.

Evaluation of predictive performance of each model based on AUC value.

The training dataset was combined with the validation dataset, and 7 ML algorithms were employed to establish the prediction model. LR is a classification algorithm designed to establish a relationship between a feature and the probability of a specific outcome. Rather than using LR for estimating class probability, it employs S-shaped functions for modeling.^18,19 DT is primarily utilized for classification tasks. It begins at the root node to split the dataset based on the most informative feature, creating decision points that segment the data into distinct classes.²⁰ RF is an extension of the DT method and functions as an ensemble approach. It generates multiple DTs, with the majority vote from these trees determining the final class prediction of the model.^21,22 NNET is a ML algorithm inspired by biological NNETs. Artificial NNETs consist of interconnected nodes that communicate through connections.^23,24 SVM classifies data by defining boundaries that separate classes. The optimization process aims to maximize the margin between these class boundaries. While SVMs mainly outperform LR, their computational complexity may lead to longer training time during model development.^25,26 GBM is a boosting technique that serves as a numerical optimization algorithm for constructing additive models that minimize loss functions.^27,28 NB is a straightforward classification algorithm that calculates the probability of each category's occurrence given the item to be classified. The item is assigned to the category with the highest probability.^29,30

Performance evaluation of the model involved various metrics, including accuracy, recall, and other indicators. The primary indicator for predicting binary classification results was the area under the receiver operating characteristic curve (AUC-ROC). This metric varies from 0 to 1, with higher values signifying a superior performance. Additionally, for models with two outcomes, the area under the accuracy-recall curve was utilized, illustrating the trade-off between true accuracy and positive predicted values, and the F1 score, defined as the harmonic mean of recall and accuracy. The model underwent 10-fold cross-validation on the training set to assess its performance on the test set. According to the optimal model, a network estimator was developed to facilitate disease prediction using patient data. This estimator enables surgeons to assess the risk of lymph node metastasis in gastric cancer patients.

Results

Baseline clinical data in the training group and the verification group

The comparison of clinical data between the two groups is shown in Table 1. The gender, tumor location, age, and surgical method exhibited no significant differences between the two groups (P > .05). In the training dataset, the proportion of total gastrectomy, neurovascular invasion, and maximum tumor diameter >5 cm in patients with lymph node metastasis were significantly higher than those in patients without lymph node metastasis (P < .05). In the verification dataset, the number of patients who aged >60 years old, neurovascular invasion, and maximum tumor diameter >5 cm in patients with lymph node metastasis were significantly greater than those in patients without lymph node metastasis (P < .05).

Table 1.

Comparison of Clinical Data Between the two Groups.

Clinical Data	Training Set		t/Z/χ²	P	Validation Set		t/Z/χ²	P
Clinical Data	No Lymph Node Metastasis(n = 141)	Lymph Node Metastasis(n = 228)	t/Z/χ²	P	No Lymph Node Metastasis(n = 51)	Lymph Node Metastasis(n = 72)	t/Z/χ²	P
gender			1.017	.313			1.126	.289
Male	99(70.2)	171(75.0)			33(64.7)	53(73.6)
Female	42(29.8)	57(25.0)			18(35.3)	19(26.4)
Age			0.015	.901			4.729	.030
≤60y	64(45.4)	105(46.1)			27(52.9)	24(33.3)
>60y	77(54.6)	123(53.9)			24(47.1)	48(66.7)
Mode of operation			7.816	.005			3.578	.059
Partial stomach	113(80.1)	152(66.7)			43(84.3)	50(69.4)
Whole stomach	28(19.9)	76(33.3)			8(15.7)	22(30.6)
Tumor invasion depth			−11.022	<.001			−7.114	<.001
T1	61(43.3)	13(5.7)			30(58.8)	4(5.6)
T2	42(29.8)	22(9.6)			13(25.5)	13(18.1)
T3	21(14.9)	64(28.1)			6(11.8)	27(37.5)
T4	17(12.1)	129(56.6)			2(3.9)	28(38.9)
Tumor site			0.716	.699			0.392	.822
Body of stomach	24(17.0)	32(14.0)			18(35.3)	22(30.6)
Antrum of stomach	73(51.8)	126(55.3)			26(51.0)	38(52.8)
Cardia of stomach	44(31.2)	70(30.7)			7(13.7)	12(16.7)
Nerve or vascular invasion			128.649	<.001			54.772	<.001
No	108(76.6)	39(17.1)			42(82.4)	11(15.3)
Yes	33(23.4)	189(82.9)			9(17.6)	61(84.7)
Maximum tumor diameter			38.634	<.001			8.323	.004
≤5cm	122(86.5)	126(55.3)			46(90.2)	49(68.1)
>5cm	19(13.5)	102(44.7)			5(9.8)	23(31.9)
PIV	132.00(80.73,226.80)	190.72(106.49,311.44)	−3.606	<.001	149.43(91.73,217.49)	173.59(102.20,274.73)	−1.586	.113
CEA	2.47(1.53,3.58)	2.90(1.82,6.87)	−3.189	.001	2.65(1.47,3.95)	4.91(1.97,9.02)	−2.331	.020

CEA, carcinoembryonic antigen; PIV, pan-immune-inflammation value.

The results of Mann-Whitney U test revealed that there were no statistically significant differences in the depth of infiltration, PIV, and CEA level between the two groups (P > .05). It was found that the depth of infiltration and CEA level in patients with lymph node metastasis were significantly higher than those in patients without lymph node metastasis (P < .05). In the training dataset, the infiltration depth, PIV, and CEA level in patients with lymph node metastasis were significantly greater than those in patients without lymph node metastasis (P < .05). 2.

LR analysis of influence on lymph node metastasis 2.1.

Table 2 presents the results of univariate LR analysis of factors influencing lymph node metastasis. A binary LR model was established, with the presence of lymph node metastasis as the dependent variable (yes = 1, no = 0). Gender, age, surgical method, degree of tumor invasion, tumor location, nerve vessel invasion, maximum tumor diameter, PIV, NWR, and CEA level were considered as independent variables (inclusion criterion 0.05, exclusion criterion 0.10). The findings indicated significant associations between the surgical approach, degree of tumor invasion, nerve vessel invasion, maximum tumor diameter, PIV, CEA level, and the occurrence of lymph node metastasis (P < .05).

2.2.

Table 3 presents the results of multifactor LR analysis regarding lymph node metastasis. In this analysis, lymph node metastasis was considered as the dependent variable (yes = 1, no = 0), and a binary LR model was established using the entry method. The independent variables included surgical approach, degree of tumor invasion, neurovascular invasion, maximum tumor diameter, PIV, and CEA level, with an inclusion criterion of 0.05 and an exclusion criterion of 0.10. The findings indicated that the degree of tumor invasion, neurovascular invasion, and maximum tumor diameter demonstrated statistical significance (P < .05). These factors were identified as risk factors influencing the occurrence of lymph node metastasis in patients (OR > 1). Specifically, a higher degree of tumor invasion, the presence of neurovascular invasion, and a maximum diameter exceeding 5 cm were associated with an elevated risk of lymph node metastasis in patients.

Evaluation of predictive performance of each model

Table 2.

Single Factor Logistic Regression Affecting Lymph Node Metastasis.

Factor	B	SE	Wald	P	OR(95% CI)
Gender（Female versus male）	0.241	0.239	1.015	.314	0.786(0.491–1.256)
Age（>60y versus ≤60y）	0.027	0.215	0.015	.901	0.974(0.639–1.484)
Mode of operation	0.702	0.254	7.665	.006	2.018(1.228–3.317)
Degree of tumor invasion	–	–	104.203	<.001	–
T1	–	–	–	–	1.000
T2	0.899	0.403	4.974	.026	2.458(1.115–5.417)
T3	2.660	0.396	45.204	<.001	14.300(6.585–31.056)
T4	3.573	0.400	79.822	<.001	35.606(16.262–77.964)
Tumor site	–	–	0.714	.700	–
Body of stomach	–	–	–	–	1.000
Antrum of stomach	0.258	0.307	0.705	.401	1.295(0.709–2.365)
Cardia of stomach	0.177	0.332	0.284	.594	1.193(0.623–2.285)
Nerve or vascular invasion	2.764	0.266	108.358	<.001	15.86(9.426–26.687)
Maximum tumor diameter	1.648	0.280	34.579	<.001	5.198(3.001–9.004)
PIV	0.002	0.001	7.474	.006	1.002(1.000–1.003)
CEA	0.024	0.012	3.866	.049	1.024(1.000–1.049)

CEA, carcinoembryonic antigen; PIV, pan-immune-inflammation value; OR, odds ratio; CI, confidence interval.

Table 3.

Multiple Logistic Regression Factors Affecting Lymph Node Metastasis.

Factor	B	SE	Wald	P	OR(95% CI)
Mode of operation	0.105	0.347	0.091	.763	1.110(0.562–2.192)
Degree of tumor invasion	–	–	22.646	<.001	–
T1	–	–	–	–	1.000
T2	0.428	0.444	0.930	.335	1.534(0.643–3.661)
T3	1.194	0.474	6.360	.012	3.301(1.305–8.352)
T4	2.042	0.468	19.054	<.001	7.705(3.080–19.272)
Nerve or vascular invasion	1.913	0.325	34.640	<.001	6.772(3.582–12.804)
Maximum tumor diameter	0.837	0.355	5.543	.019	2.308(1.150–4.632)
PIV	0.001	0.001	0.787	.375	1.001(0.999–1.002)
CEA	0.008	0．008	0.898	.343	1.008(0.991–1.025)

CEA, carcinoembryonic antigen; PIV, pan-immune-inflammation value; OR, odds ratio; CI, confidence interval.

In order to compare the predictive performance of the 7 ML-based models, this study employed 10-fold cross-validation and utilized the AUC value, validated on the test dataset, as the primary metric for assessing the model's performance. As shown in Table 4 and Figure 2, GBM model exhibited the best performance in predicting the occurrence of lymph node metastasis in gastric cancer patients, with an average AUC of 0.927. In this study, the feature importance (as shown in Figure 3) was developed based on the GBM model. Feature importance enables the visualization of the model's internal results, highlighting the significance of specific variables within the model. By inputting clinical characteristics of patients with gastric cancer and lymph node metastasis, clinicians can predict the risk of lymph node development in those patients (Figure 4).

Figure 3.

Feature importance.

Figure 4.

A developed web-based risk estimator.

Table 4.

Prediction Performance Evaluation of Each Model.

model	AUC	Accuracy	Kappa	Sensitivity(Recall Rates)	Specificity
DT	0.824	0.821	0.638	0.806	0.843
RF	0.923	0.854	0.702	0.847	0.882
SVM	0.721	0.585	0.000	0.750	0.547
GBM	0.927	0.870	0.734	0.875	0.863
GaussianNB	0.914	0.821	0.640	0.861	0.843
MLP	0.907	0.837	0.665	0.882	0.824
LR	0.898	0.821	0.636	0.806	0.882

DT, decision tree; RF, random forest; SVM, support vector machine; GBM, gradient boosting machine; NB, naive Bayes.

Discussion

As a result of the limited early detection of gastric cancer, over 50% of patients are diagnosed at advanced stages or with metastasis. At present, surgery is the main method for the treatment of gastric cancer, and lymph node metastasis is regarded as the main factor affecting the stage, grade, and survival rate of gastric cancer.^31,32 Therefore, early prediction of the occurrence of lymph node metastasis is vital. To date, several scholars have concentrated on lymph node metastasis in gastric cancer, while few studies have developed tools to provide accurate predictions. Therefore, the development of precise predictive models is essential to facilitate collaborative decision-making for clinicians and patients. The continuous advancement of artificial intelligence in the field of clinical research has led to the introduction of innovative approaches.

ML uses computer algorithms to learn complex relationships or patterns among data from a large amount of data, and conducts a large number of operations by training existing algorithms to identify data, and iteratively changes algorithms to achieve the best performance, thus producing a model that links multiple variables with target variables.³³ Specifically, supervised ML identifies relationships between input and output data (ie, computers learn from patient data), thus allowing predictions of outcomes based on input data. In addition, ML contributes to a paradigm shift inherent in healthcare, where computers learn from patient data without having to be specifically programed to accomplish tasks. Its advantage is that it has the advantages of high capability, objectivity, and repeatability when processing large data sets, and the data is reliable.³⁴ It also has the potential to improve the quality of early diagnosis, identify disease progression, and can increase the likelihood of predicting orthopedic patient-specific outcomes, such as outcome scores, risk of complications, and survival of implants.³⁵ These advantages can facilitate decision-making information sharing between clinicians and patients and facilitate effective planning and rational use of health care services.³⁶ In addition, the model can be retrained over time to improve the prediction accuracy.³⁷

ML represents an evolving frontier in the field of medicine, drawing substantial resources to connect computer science and statistical analysis with medical challenges. ML has the capacity to effectively handle extensive, diverse, and intricate medical data. Consequently, the implementation of ML techniques in medicine is widely regarded as the cornerstone of future endeavors in biomedical research, personalized medicine, and computer-aided diagnosis.^38,39 Specifically, the operational framework of ML involves development of algorithms to execute numerous tasks, refining the algorithms iteratively to optimize performance. Ultimately, this process yields a model that establishes connections between multiple variables and target outcomes. In the present study, clinical data were collected, and ML algorithms were employed to develop a model for assessing the risk of lymph node metastasis in gastric cancer. By leveraging multiple variables, clinicians can employ this AI-driven approach to select more efficacious treatment strategies.^40–43

In this study, in addition to some clinicopathological data, hematological indicators, namely the immunoinflammatory factors (PIV and CEA), were utilized to develop the prediction model. PIV is a novel blood-based biomarker that integrates different subsets of peripheral blood immune cells, neutrophils, platelets, monocytes, and lymphocytes. As PIV has the potential to comprehensively represent patients’ immunity and systemic inflammation, it may potentially serve as a robust predictor in advanced cancer patients undergoing cytotoxic chemotherapy, immunotherapy, and targeted therapy. It has been previously demonstrated that PIV is mainly dependent on neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, lymphocyte-to-monocyte ratio, and other indicators in predicting cancer prognosis.⁴⁴ CEA is a widely used serum tumor marker in clinical practice, particularly in the early screening of various types of cancer, and its elevation level is also regarded as an independent risk factor for poor prognosis of gastric cancer.⁴⁵ Development of a model based on combination of clinicopathological data with hematological suggestions can better reflect the physiological and pathological changes of patients with gastric cancer during the disease, making the model more representative.

The results of the traditional logic algorithm indicated that surgical method, degree of tumor invasion, nerve vessel invasion, maximum tumor diameter, PIV, and CEA level were the influential factors of lymph node metastasis in patients with gastric cancer. These factors were imported into the multivariate LR model. The greater degree of tumor invasion, the presence of nerve vessel invasion, and the largest tumor diameter > 5 cm were the independent influential factors of lymph node metastasis in gastric cancer patients. Using ML, seven models were established for comparative analysis, utilizing the AUC as the benchmark for assessment. The outcomes were summarized as follows: the AUC for the DT model was 0.824, the RF model yielded an AUC of 0.923, the AUC for SVM was 0.721, and the GBM model demonstrated an AUC of 0.927. The NB model's AUC stood at 0.914, while the NNET model's AUC reached 0.907. The results of the seven models indicated that the GBM model displayed the most reliable performance, while SVM exhibited the least promising results. Furthermore, a feature importance table was developed based on the highly effective GBM model, which highlighted that factors, such as nerve or vascular invasion, CEA level, maximum tumor diameter, PIV, age, and tumor site were significant contributors to the occurrence of lymph node metastasis.

Utilizing the top-performing GBM model as a basis, a network-based risk estimator was developed, as well as feature importance assessment. The feature importance analysis served to underscore the significance of specific indicators within the model, thereby offering insight into the model's structure. Meanwhile, the network risk estimator directly computed the likelihood of lymph node metastasis in gastric cancer patients based on their clinical data. This web-based tool is user-friendly with a straightforward design, rendering it accessible for healthcare practitioners. It serves as a valuable asset in the diagnosis and treatment, highly supporting clinicians.

Through a comparative analysis of the proposed ML model and the conventional logistic model, commonalities and distinctions were assessed between them. The convergence between these models is evident in their shared findings, highlighting the significance of factors, such as nerve or vascular invasion, maximum tumor diameter, and the extent of tumor invasion as influential factors of lymph node metastasis. Nevertheless, the divergence is apparent in the ability of ML to discern the predictive potential of additional indicators for lymph node metastasis through a more precise and intricate computation. For instance, the feature importance analysis underscored that parameters, including PIV, CEA level, surgical approach, and tumor location should not be disregarded when predicting lymphatic metastasis.

The conventional logistic model was traditionally developed by conducting univariate analysis, followed by the inclusion of variables that exhibited statistical significance in the univariate analysis into the multivariate analysis to enhance the model's robustness. However, it is noteworthy that data with a P-value exceeding 0.05 in the univariate analysis can still exhibit significance when considered in the multivariate analysis. Consequently, traditional logistic modeling may be prone to inaccuracies in the final model. In contrast, the clinical prediction model established using ML could overcome this limitation through meticulous calculations. These models not only demonstrated their performance by evaluating the AUC, but also the model's dependability could be assessed through the Kappa value.

Advantages and Limitation

This study included 369 patients from the Affiliated Hospital of Xuzhou Medical University for internal validation and 123 patients from the Jining NO.1 People's Hospital for external validation. The model developed using ML algorithms not only demonstrated superior performance, but also exhibited enhanced accuracy and reliability in its external applicability. However, retrospective studies are inherently prone to subjective and selection biases. Furthermore, while this is a two-center study, additional institutions should be involved to further validate the external applicability of the model.

Conclusions

In conclusion, this study proposed a computational model through ML algorithms, providing physicians with the capability for real-time risk assessment of patients and aiding in personalized diagnosis and treatment. Compared with traditional LR, ML models are associated with more precise computational capabilities and direct clinical utility through advanced algorithms, emphasizing their substantial potential in clinical practice.

Footnotes

Acknowledgements

The authors thank the Affiliated Hospital of Xuzhou Medical University and Jining First People's Hospital for their support of this study; We thank all staff members involved in this study.

Authors’ Contributions

Tong Lu and Yu Fang designed the study and wrote the manuscript. Tao Li and Chong Chen analyzed the data. Haonan Liu collected the data. Miao Lu and Daqing Song revised the manuscript.

Availability of Data and Materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethics Approval and Consent to Participate

The risk posed by the study to subjects does not exceed the minimum risk. The study protocol was approved by the Ethics Committees of the Affiliated Hospital of Xuzhou Medical University (grant number XYFY2021-KL145-01) and Jining First People's Hospital (JNRM-L62). All methods were conducted following relevant guidelines and regulations. The Ethics Committees of the Affiliated Hospital of Xuzhou Medical University and Jining First People's Hospital waived the requirement for informed consent.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Tong Lu

References

Salvatori

Marafini

Laudisi

Monteleone

Stolfi

. Helicobacter pylori and gastric cancer: Pathogenetic Mechanisms. Int J Mol Sci. 2023;24(3):2895. doi:10.3390/ijms24032895

Liu

. Chinese Guidelines for diagnosis and treatment of gastric cancer 2018 (English version). Chin J Cancer Res. 2019;31(5):707–773. doi:10.21147/j.issn.1000-9604.2019.05.02

Tian

, et al. Clinical value of regional lymph node sorting in gastric cancer. World J Gastrointest Oncol. 2022;14(12):2393–2403. doi:10.4251/wjgo.v14.i12.2393

Wang

Gong

Tang

Cui

. Neutrophil/lymphocyte ratio predicts lymph node metastasis in patients with gastric cancer. Am J Transl Res. 2023;15(2):1412–1420. doi:10.1002/bjs.11928

Tian

, et al. Clinical value of regional lymph node sorting in gastric cancer. World J Gastrointest Oncol. 2022;14(12):2393–2403. doi:10.4251/wjgo.v14.i12.2393

Wang

Chen

Gao

, et al. Predicting gastric cancer outcome from resected lymph node histopathology images using deep learning. Nat Commun. 2021;12(1):1637. doi: 10. 1038/s41467-021-21674-7

Dong

Fang

Tang

, et al. Deep learning radiomic nomogram can predict the number of lymph node metastasis in locally advanced gastric cancer: An international multicenter study. Ann Oncol. 2020;31(7):912–920. doi: 10.1016/j.annonc.2020.04.003

Wang

Liu

, et al. CT radiomics nomogram for the preoperative prediction of lymph node metastasis in gastric cancer. Eur Radiol. 2020;30(2):976–986. doi:10.1007/s00330-019-06398-z

Jin

Jiang

, et al. Deep learning analysis of the primary tumour and the prediction of lymph node metastases in gastric cancer. Br J Surg. 2021;108(5):542–549. doi:10.1002/bjs.11928

10.

Topol

. High-performance medicine: The convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. doi:10.1038/s41591-018-0300-7

11.

Xie

Xiong

Lei

Feng

. Machine learning for lymph node metastasis prediction of in patients with gastric cancer: A systematic review and meta-analysis. Front Oncol. 2022;12:946038. doi:10.3389/fonc.2022.946038

12.

Bhinder

Gilvary

Madhukar

, et al. Artificial intelligence in cancer research and precision medicine. Cancer Discov. 2021;11(4):900–915. doi:10.1158/2159-8290.CD-21-0090

13.

Mainali

. Artificial intelligence in medical science: Perspective from a medical student. J Nepal Med Assoc. 2020;58(229):709–711. doi:10.31729/jnma.5257

14.

Seifert

Weber

Kocakavuk

Rischpler

Kersting

. Artificial intelligence and machine learning in nuclear medicine: Future perspectives. Semin Nucl Med. 2021;51(2):170–177. doi:10.1053/j.semnuclmed.2020.08.003

15.

Luo

Gao

Gan

Xie

. Clinical-radiomics nomogram for predicting esophagogastric variceal bleeding risk noninvasively in patients with cirrhosis. World J Gastroenterol. 2023;29(6):1076–1089. doi:10.3748/wjg.v29.i6.1076

16.

Yuan

Chen

. Comparison of the effectiveness of different machine learning algorithms in predicting new fractures after PKP for osteoporotic vertebral compression fractures. J Orthop Surg Res. 2023;18(1):62. doi:10.1186/s13018-023-03551-9

17.

Collins

Reitsma

Altman

Moons

. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Br Med J. 2015;350:g7594. doi: 10.1136/bmj.g7594. PMID: 25569120.

18.

Zhou

Wang

Yang

Zhu

. Predicting postoperative gastric cancer prognosis based on inflammatory factors and machine learning technology. BMC Med Inform Decis Mak. 2023;23(1):53. doi:10.1186/s12911-023-02150-2

19.

Song

Liu

Wang

. Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis. Int J Med Inform. 2021;151:104484. doi:10.1016/j.ijmedinf.2021.104484

20.

Koga

Zhou

Dickson

. Machine learning-based decision tree classifier for the diagnosis of progressive supranuclear palsy and corticobasal degeneration. Neuropathol Appl Neurobiol. 2021;47(7):931–941. doi:10.1111/nan.12710

21.

Collin

Durif

Raynal

, et al. Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC random forest. Mol Ecol Resour. 2021;21(8):2598–2613. doi:10.1111/1755-0998.13413

22.

Choi

Coyner

Kalpathy-Cramer

Chiang

Campbell

. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol. 2020;9(2):14. doi:10.1167/tvst.9.2.14

23.

Citko

Sienko

. Inpainted image reconstruction using an extended Hopfield neural network based machine learning system. Sensors (Basel). 2022;22(3):813. doi:10.3390/s22030813

24.

Dinh

Miertschin

Young

Mohanty

. A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Mak. 2019;19(1):211. doi:10.1186/s12911-019-0918-5

25.

Fang

. Stroke prediction with machine learning methods among older Chinese. Int J Environ Res Public Health. 2020;17(6):1828. doi:10.3390/ijerph17061828

26.

Cha

Moon

Kim

. Comparison of random forest and gradient boosting machine models for predicting demolition waste based on small datasets and 18. 18 26. Categorical variables. Int J Environ Res Public Health. 2021;18(16):8530. doi:10.3390/ijerph18168530

27.

Senders

Staples

Mehrtash

, et al. An online calculator for the prediction of survival in glioblastoma patients using classical statistics and machine learning. Neurosurgery. 2020;86(2):E184–E192. doi:10.1093/neuros/nyz403

28.

Chang

Lin

Lane

. Machine learning and novel biomarkers for the diagnosis of Alzheimer's disease. Int J Mol Sci. 2021;22(5):2761. doi:10.3390/ijms22052761

29.

Peiffer-Smadja

Rawson

Ahmad

, et al. Machine learning for clinical decision support in infectious diseases: A narrative review of current applications. Clin Microbiol Infect. 2020;26(5):584–595. doi: 10.1016/j.cmi.2019.09.009. Epub 2019 Sep 17. Erratum in: Clin Microbiol Infect. 2020 Aug;26(8):1118.

30.

Zhang

Shao

. PET/CT for predicting occult lymph node metastasis in gastric cancer. Curr Oncol. 2022;29(9):6523–6539. doi:10.3390/curroncol29090513

31.

Zhou

Zhao

, et al. Establishment and validation for predicting the lymph node metastasis in early gastric adenocarcinoma. J Healthc Eng. 2022;2022:8399822. doi:10.1155/2022/8399822

32.

Obermeyer

Emanuel

. Predicting the future - big data, machine learning, and clinical medicine. N Engl J Med. 2016;375:1216e9. doi:10.1056/NEJMp1606181

33.

Bayliss

Jones

. The role of artificial intelligence and machine learning inpredicting orthopaedic outcomes. Bone Joint J. 2019;101-b:1476e8. doi:10.1302/0301-620X.101B12.BJJ-2019-0850.R1

34.

Devries

Hoda

Rivers

Maher

Phan

. Development of an unsupervised machine learning algorithm for the prognosticationofwalking ability in spinal cord injury patients. Spine J. 2019;20:213–224. doi:10.1016/j.spinee.2019.09.007

35.

Bien

Rajpurkar

Ball

Irvin

Lungren

. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Developmentand retrospective validation of MRNet. PLoS Med. 2018;15:e1002699. doi:10.1371/journal.pmed.1002699

36.

Deng

Tang

Sheng

RSF

. Detecting fatiguestatus of pilots based on deep learning network using EEG signals. IEEE Trans Cogn DevSyst. 2020;13:575–585. doi:10.1109/TCDS.2019.2963476

37.

MacEachern

Forkert

. Machine learning for precision medicine. Genome. 2021;64(4):416–425. doi:10.1139/gen-2020-0131

38.

Handelman

Kok

Chandra

Razavi

Lee

Asadi

. Edoctor: Machine learning and the future of medicine. J Intern Med. 2018;284(6):603–619. doi:10.1111/joim.12822

39.

Seligman

Tuljapurkar

Rehkopf

. Machine learning approaches to the social determinants of health in the health and retirement study. SSM PopulHealth. 2017;4:95e9. doi:10.1016/j.ssmph.2017.11.008

40.

Sim

Fong

Huang

Tan

. Machine learning in medicine: What clinicians should know. Singapore Med J. 2023;64(2):91–97. doi:10.11622/smedj.2021054

41.

Arslan

Schulz

Rai

. Machine learning in epigenomics: Insights into cancer biology and medicine. Biochim Biophys Acta Rev Cancer. 2021;1876(2):188588. doi:10.1016/j.bbcan.2021.188588

42.

Zeng

Liu

Fang

, et al. PIV And PILE score at baseline predict clinical outcome of anti-PD-1/PD-L1 inhibitor combined with chemotherapy in extensive-stage small cell lung cancer patients. Front Immunol. 2021;12:724443. doi:10.3389/fimmu.2021.724443

43.

Yang

Liu

Tong

Liang

Chen

. Prognostic value of pan-immune-inflammation value in colorectal cancer patients: A systematic review and meta-analysis. Front Oncol. 2022;12:1036890. doi:10.3389/fonc.2022.1036890

44.

Şahin

Cubukcu

Ocak

, et al. Low pan-immune-inflammation-value predicts better chemotherapy response and survival in breast cancer patients treated with neoadjuvant chemotherapy. Sci Rep. 2021;11(1):14662. doi:10.1038/s41598-021-94184-7

45.

Feng

Tian

, et al. Diagnostic and prognostic value of CEA, CA19-9, AFP and CA125 for early gastric cancer. BMC Cancer. 2017;17(1):737. doi:10.1186/s12885-017-3738-y

Comparison of Machine Learning and Logic Regression Algorithms for Predicting Lymph Node Metastasis in Patients with Gastric Cancer: A two-Center Study

Abstract

Objectives

Methods

Results

Conclusion

Keywords

Introduction

Materials and Methods

Observational Indicators

Statistical Analysis

Results

Discussion

Advantages and Limitation

Conclusions

Footnotes

Acknowledgements

Authors’ Contributions

Availability of Data and Materials

Declaration of Conflicting Interests

Ethics Approval and Consent to Participate

Funding

ORCID iD

References