Abstract
Keywords
Introduction
Globally, gastric adenocarcinoma (GA) is the fifth most common cause of cancer-related death in men and the fourth most common in women. 1 It most often occurs in the distal pyloric region. The disease is asymptomatic, and in areas without an effective GA screening system, most cases are diagnosed at an advanced period, 2 when the prognosis is generally poor if metastases have formed. 3 Hematoxylin and eosin (HE) staining and characteristic immunohistochemistry of GA are shown in Figure 1. GA usually progresses slowly but ultimately develops into multiple metastases, most commonly occurring in the liver and distant lymph nodes, while ocular metastases (OM) are rare. 4 A case report has shown cases of iris metastasis three years after the diagnosis of gastric cancer, 5 and a case of eye metastasis eight months after tumor diagnosis, when the patient was receiving systemic chemotherapy, has been reported. 6 OM of GA mainly manifests as eyelid tumefaction, eye pain and even blindness, with poor quality of life. 7 To relieve patients’ suffering, early monitoring of eye diseases in GA patients is of great significance.

He and characteristic immunohistochemistry of gastric adenocarcinoma. Notes: Immunohistochemical staining showed CK7 as the immunomarker in (C) and CK20 as the immunomarker in (D). Abbreviation: HE, Hematoxylin eosin staining.
Artificial intelligence (AI) shows excellent potential in multiple medical fields. 8 Machine learning (ML) is a division of AI and is superior to other techniques in developing predictive models. 9 It “learns” from data without direct programming, which signifies that the expression of a particular task advances with incremental data and variables. Recently, ML has shown exciting promise for diagnostic applications. For example, the accuracy of an AI diagnosis system in examining upper gastrointestinal cancer is reportedly over 91.7%. 10 In addition, researchers have proposed that deep learning models can classify gastric diffuse adenocarcinoma with high specificity and help pathologists diagnose the potential of workflow systems. 11 ML has also made progress in the treatment of GA and its complications. Researchers have developed a powerful ML method to forecast anastomotic leakage in patients with GA undergoing gastrectomy in real-time, which can guide surgeons’ intraoperative decision-making. 12 Moreover, machine learning (ML) can help to judge the prognosis of patients with GA. For example, lymphatic vessel invasion and perineural invasion are associated with poor prognosis of GA. ML-based CT texture analysis may predict lymphatic invasion and perineural invasion of tubular GA. 13
Though numerous previous studies have demonstrated the strengths of AI in the diagnosis, treatment, and prognosis of GA, there is still no model to forecast the risks of OM in patients with GA. This research aims to identify potential biomarkers of GA OM and construct risk prediction models via demographic and serological indicators in a large sample of GA patients. In addition, by comparing ML models, an optimal model was chosen, and a web calculator was developed to realize personalized prediction of GA eye metastasis, thereby further improving the prognosis of GA patients.
Methods
Study Design
This retrospective cohort study included 3532 patients diagnosed with GA from June 2003 to May 2019. This study was approved by the Medical Research Ethics Committee of the First Affiliated Hospital of Nanchang University, and the approved number was cdyfy20170411. In this study, other types of gastric cancer were excluded by pathological examination, including signet ring cell carcinoma, adenosquamous carcinoma, myeloid carcinoma, and undifferentiated cell carcinoma. Patients with unknown information were culled. Patients were randomly classified into a training set and a validation set at a ratio of 7:3. OM of GA can be divided into choroidal metastasis and retinal metastasis. During choroidal metastasis, fundus fluorescein angiography showed that the tumor components dominated by cancer cells portrayed diffuse fluorescence consistent with the size of the tumor in the arterial phase and early venous phase due to choroidal fluorescence shielding. Optical coherence tomography portrayed choroidal uplift lesions with subretinal effusion. B-ultrasound portrayed a flat bulge tumor, and Doppler ultrasound showed blood flow in the tumor. When retinal metastasis occurred, high-resolution thin-layer scanning manifested calcification. When the tumor invaded the outside of the eyeball, it spread along the optic nerve and invaded the brain, showing optic nerve thickening and orbital or intracranial mass. Magnetic resonance imaging (MRI) showed irregular soft tissue masses in the eyeball. T1WI showed a slightly higher signal, and T2WI showed an unevenly low signal than normal vitreous. When accompanied by a retinal detachment, MRl showed the nature of subretinal effusion, manifested as water signal change and equal signal with the vitreous. All the patients underwent imaging diagnosis to exclude the OM and were classified as non-OM (NOM). We use synthetic minority oversampling technique (SMOTE) on the valid test set to generalize the OM group sample size 14 (Figure 2). SMOTE has a good balance performance index, it can reduce the overfitting of the model, and improve the accuracy, sensitivity, and specificity of the test set. 15 All participants understood the purpose and content of our study and signed informed consent. The researchers deleted the subjects’ private information for this study. The reporting of this study conformed to STROBE guidelines. 16

Summary of patients inclusion. Abbreviations: HB, Hemoglobin; Ca2+, calcium; ALP, alkaline phosphatase; TG, triglyceride; HDL, high-density lipoprotein; LDL, low-density lipoprotein; ApoA1, apolipoprotein A1; ApoB, apolipoprotein B; AFP, alpha fetoprotein; CA724, carbohydrate antigen-724; CEA, carcinoembryonic antigen; CA125, carbohydrate antigen-125; CA153, carbohydrate antigen-153; CA199, carbohydrate antigen-199; CYFRA21-1, cytokeratin 19 fragment antigen21-1; Lp (a), lipoprotein (a); NOM, no ocular metastasis; OM, ocular metastasis.
Data Collection
Clinical indicators were gathered from patient documents, including (a) demographic data: gender, age, histological grade, and treatment measures and (b) diagnostic serological test results: Hb, calcium, alkaline phosphatase (ALP), total cholesterol (TC), triglyceride (TG), high-density lipoprotein (HDL), LDL, apolipoprotein A1 (ApoA1), Apolipoprotein B (ApoB), AFP, CA724, CA125, CA153, CA199, CEA, cytokeratin-19 fragment (CYFRA21-1), and lipoprotein (a) (Lp (a)).
Statistical Analysis
Statistical analysis was conducted using Python (version 3.8, Python Software Foundation) and R software (version 4.0.2). In Python, the training set data were used to build the model, and the validation set data were used to validate and evaluate the model. An independent samples t-test was used for normally distributed continuous numerical data; for non-normally distributed continuous data, a Mann–Whitney U test was applied. For categorical count data, a chi-square test was used. Univariate and multivariate logistic regression were used to determine the risk factors for OM in patients with GA. The selection of ML feature variables is based on the feature importance of the random forest (RF) algorithm and forward sequential feature selection. The Python programming language (Version 3.8) was also used to develop and evaluate ML models and design network calculators. Shapley additive interpretation (SHAP) was implemented for model interpretation using the Python SHAP package. A p-value < 0.05 was considered statistically significant, and all logistic regression analyses were performed using a 95% confidence interval (CI).
Data Preprocessing and Feature Engineering
The criteria for selecting feature variables in subsequent ML algorithms were: feature importance ranking and order positive selection based on RF algorithm. 16 On this basis, all the feature variables were included in the feature importance ranking and screening. Then, these variables were input into the hierarchical clustering algorithm to remove the characteristic variables with multicollinearity. Then, we reordered the pre-selected variables. Finally, the optimal ML feature variable was determined according to when the AUC of the receiver operating characteristic (ROC) reached a stable value. We selected the first 13 feature variables for subsequent ML algorithm development, because after the 14th iteration, AUC did not improve significantly (Figure 4A). Subsequently, the feature variables of ML model construction include LDL, CEA, CA724, CA125, TC, Ca2+, HDL, AFP, CA153, CA199, TG, Hb, and ALP. We used the SHAP package to establish a risk factor variable importance ranking for patients with GA. For each patient, the SHAP model can produce a predicted value, and the total or mean of the absolute Shaply values of all samples is the integrated significance score of the feature. In addition, the SHAP method demonstrates the positive or negative influence of each eigenvalue on the prediction results, similar to the coefficient value in logistic regression. A positive SHAP value signifies that the corresponding characteristic has a higher probability of recurrence risk, and a negative value signifies a lower risk. 17
Model Establishment
All algorithm models were built using scikit-learn (Version 0.24.2). In this research, we utilized six ML algorithms: multilayer perceptron (MLP) 18 ; adaboost (AB) model 19 ; bagging classification (BAG) model 20 ; logistic regression (LR) 21 ; gradient boosting machine (GBM) 22 ; and extreme gradient boosting (XGB) model. 23 The ML algorithm was trained and modulated to forecast OM in patients with GA. The model's hyperparameters were adjusted using the random search approach in scikit-learn. By comparing the performance of different ML models on the training and testing datasets, metrics such as AUC value, precision–recall (PR) curve, confusion matrix, sensitivity, specificity, and F1 score were evaluated. Finally, we chose the most excellent manifesting model to construct a web calculator.
Results
Demographic Baseline Data
After the screening, the data of 3008 GA patients were included. Among them, during the study period, 20 patients had OM, and 2988 patients had NOM. There were significant differences in pathological type, LDL, CA724, and CEA between the OM and NOM groups (p < 0.05). The demographic and clinicopathological features of the above patients are detailed in Table 1.
Demographic and Clinicopathological Characteristics of Patients.
*p < 0.05.
Abbreviations: HB, Hemoglobin; Ca2+, calcium; ALP, alkaline phosphatase; TC, total cholesterol; TG, triglyceride; HDL, high-density lipoprotein; LDL, low-density lipoprotein; ApoA1, apolipoprotein A1; ApoB, apolipoprotein B; AFP, alpha fetoprotein; CA724, carbohydrate antigen-724; CEA, carcinoembryonic antigen; CA125, carbohydrate antigen-125; CA153, carbohydrate antigen-153; CA199, carbohydrate antigen-199; CYFRA21-1, cytokeratin 19 fragment antigen21-1; Lp (a), lipoprotein (a).
Univariate Analysis, Multivariate Logistic Regression, and Least Absolute Shrinkage and Selection Operator (LASSO) Regression
By establishing a univariate logistic regression model, we screened variables with p < 0.05 in univariate analysis for multivariate logistic regression analysis to determine the risk factors for OM in patients with GA. In univariate logistic regression, LDL was the risk factor for postoperative recurrence. Multivariate logistic regression analysis also showed that LDL was the independent risk factors for OM of GA (Table 2). LASSO regression was then used to screen out the pathological type, TG, TC, HDL, and LDL, and these were included in the six ML model characteristic variables (Figure 3).

(A) Plot for LASSO regression coefficients. (B) Cross validation plot. Abbreviations: LASSO, least absolute shrinkage and selection operator.
Single Factor Analysis and Multifactor Logistic Regression.
*p < 0.05.
Abbreviations: Hb, Hemoglobin; Ca2+, calcium; ALP, alkaline phosphatase; TC, total cholesterol;TG, triglyceride; HDL, high-density lipoprotein; LDL, low-density lipoprotein; ApoA1, apolipoprotein A1; ApoB, apolipoprotein B; AFP, alpha fetoprotein; CA724, carbohydrate antigen-724; CEA, carcinoembryonic antigen; CA125, carbohydrate antigen-125; CA153, carbohydrate antigen-153; CA199, carbohydrate antigen-199; CYFRA21-1, cytokeratin 19 fragment antigen21-1; Lp (a), lipoprotein (a).
Model Performance
We established six different ML models of MLP, AB, BAG, LR, GBM, and XGB to evaluate the risk probability and related accuracy of OM in patients with GA. The GBM model performed best in the training results, with an AUC of 0.997, an accuracy of 0.989, a sensitivity of 0.556, and a specificity of 0.995 (Table 3). The GBM algorithm for predicting OM in GA patients had an AUC value of 1 in the training set and a performance of 0.950 in the internal test set (Figure 4B). According to the PR curve results of the training set and the test set, the AUC of the GBM model was 1 in the training set (Figure 5A) and 0.747 in the test set (Figure 5B). In addition, we draw the confusion matrix of the model, and all predictions were correct in the training set after SMOTE processing (Figure 6A). In the original distribution of the test set, there were 744 correct predictions and 8 incorrect predictions, with an accuracy rate of 0.989 (Figure 6B). We conducted ROC analysis on imbalanced data using various ML methods. After using SMOTE to balance the data, the test set performed better without significant overfitting (Figure 6C; Table 3). The best ML model GBM was used and internal five-fold cross-validation was performed. The average AUC value was 0.99 and the standard deviation was 0 (Figure 6D). For the above six ML models, we also draw radar maps to evaluate the performance of different models. Compared with other ML models, GBM had the best value in the evaluation of F1-score, accuracy and AUC value (Figure 6E).

Machine learning feature variable extraction and ROC curves for both the training and testing sets. Notes: (A) displayed feature importance ranking and forward sequential feature selection based on random forests, with features selected highlighted in red font. (B) depicted the ROC curve results for the training and testing sets under different ML algorithms. Notably, the GBM model achieves an AUC value of 1 in the training set and 0.997 in the testing set. Abbreviations: AUC, the area under the curve; AB, adaptive boosting; MLP, multilayer perceptron; BAG, bootstrapped aggregating; LR, logistic regression; GBM, gradient boosting machine; XGB, extreme gradient boost.

Precision–Recall curves for six different ML algorithms on both the training and testing sets. Notes: (A) displayed the Precision-Recall (PR) curve for the training set, GBM performed the best in the training set with an AUC of 1. (B) showed the PR curve for the testing set, GBM performed the best in the testing set with an AUC of 0.747. Abbreviations: AB, adaptive boosting; MLP, multilayer perceptron; BAG, bootstrapped aggregating; LR, logistic regression; GBM, gradient boosting machine; XGB, extreme gradient boost.

The confusion matrices for the training and testing sets, the five-fold cross-validated ROC curve for the best ML model GBM, and a comparative radar chart of different ML methods under various evaluation metrics. Notes: (A, B) displayed the confusion matrices for the training and testing sets. In the testing set's confusion matrix, the accuracy was observed to be 0.989. (C) showed the test set performed better without significant overfitting after using SMOTE to balance the data. (D) showed the five-fold cross-validated ROC curve results for the best ML model GBM, with an AUC of 0.99 ± 0.00. (E) presented a radar chart visualization of different ML algorithms under various evaluation metrics. In this chart, the GBM algorithm exhibits the best performance in terms of F1-score, accuracy and AUC value. Abbreviations: GBM, gradient boosting machine; ROC, receiver operating characteristic; AUC, the area under the ROC curve.
Comparison of indicators of six machine learning methods in test set.
Abbreviations: ML, machine learning; AUC, area under the curve; AB, adaptive boosting; LR, logistic regression; XGB, extreme gradient boosting; BAG, bootstrapped aggregating; MLP, multilayer perceptron; GBM, gradient boosting machine; IBT: Imbalanced Training, BT: Balanced Training.
Note: Comparing the evaluation metrics on the test set between training with and without SMOTE technology for balancing the training set, it can be observed that balancing the training set with SMOTE leads to a more significant improvement in the model's performance on the test set. Furthermore, it further reduces the occurrence of overfitting.
Importance of Characteristic Variables
We used the SHAP library based on GBM to establish a risk factor model for OM in patients with GA (Figure 7). The results of variable importance ranking showed that LDL, CA724, CEA, AFP, CA125, Hb, CA153, and Ca2+ were important factors affecting OM in GA patients. In Figure 7A, the red highlighted factor is an important risk factor for OM, and the blue highlighted factor is a protective factor against OM. Correspondingly, in the samples used in this study, Figure 7B had a violin plot depicting the importance ranking of each sample feature. In addition, according to the SHAP value, we selected two subjects, including members of the OM group and the NOM group. In the OM group, TC = 0.95, CA199 = 22.99, Ca2+ = 2.25, AFP = 3.42, LDL = 3.20, HDL = 1.62 were important risk factors for recurrence. On the contrary, CA724 = 0.93, CEA = 1.74 and other factors were important protective factors for preventing recurrence (Figure 7C). In NOM samples, TC = 0.82, AFP = 5.19, HDL = 1.59 were important risk factors for recurrence, while LDL = 1.58, CA724 = 3.52, CEA = 2.93 were important protective factors for recurrence (Figure 7D).

SHAP summary plot and SHAP explanation of two patients. Notes: (A)(B) displayed the importance ranking of feature variables based on the GBM model. Red represented variables that are risk factors for postoperative recurrence in hepatocellular carcinoma, while blue represented variables that act as protective factors. (C) presented a high-risk SHAP interpretation model for postoperative recurrence in GA patients with OM. (D) showed a low-risk SHAP interpretation model for postoperative recurrence in the same patient group. Abbreviation: SHAP, Shapley additive explanations; GBM, gradient boosting machine; Hb, Hemoglobin; Ca2+, calcium; ALP, alkaline phosphatase; TC, total cholesterol; TG, triglyceride; HDL, high-density lipoprotein; LDL, low-density lipoprotein; AFP, alpha fetoprotein; CA724, carbohydrate antigen-724; CEA, carcinoembryonic antigen; CA125, carbohydrate antigen-125; CA153, carbohydrate antigen-153; CA199, carbohydrate antigen-199.
Web Page Calculator
Based on the GBM model, an optimal predictive performance ML algorithm, we developed the above web predictor to forecast the risk of OM in patients with GA. A probability prediction for the OM of GA was made using variable settings in the sidebar of the website (https://weicancer2-nnnjun76tn7vtkya5glyua.streamlit.app/ Figure 8).

Web calculator for predicting OM of gastric adenocarcinoma. Note: https://weicancer2-nnnjun76tn7vtkya5glyua.streamlit.app/.
Discussion
For the first time, the study used a variety of ML algorithms to predict OM in patients with GA, obtained a GBM model that can be used to predict OM of GA clinically, and explained the model. The GBM model is a commonly used algorithm with sound recognition and fast classification speed. It can make effective and accurate predictions for linear and nonlinear characteristic variables. 24 Subsequently, we designed a network risk calculator based on the GBM model to estimate the probability risk of OM in patients with GA and help clinicians develop targeted preventive measures to elevate the prognosis and life quality.
With the development of diagnosis and treatment technology, the occurrence and mortality of gastric cancer have declined. Despite this, gastric cancer maintains the most common cause of death globally, this is closely related to its asymptomatic course and delayed diagnosis. 25 Approximately 90-97% of gastric cancer is adenocarcinoma, which can be histologically divided into intestinal and diffuse. 26 Gastric cancer cells can migrate to the liver, lungs, adrenal glands, lymph nodes, peritoneum, bones, brain and eyes through blood vessels and lymphatic vessels. 27 The process of GA metastasis is closely related to tumor phenotype and follows multiple steps, the most important of which are proteolytic activity, migration, adhesion, proliferation, and neovascularization. 28 The eye has the most plentiful blood flow of any organ in the body. According to the “seed and soil” theory proposed by Paget, the colonization of metastatic tumors primarily relies on an appropriate microenvironment and sufficient blood flow. 29 In addition, Duke Elder and Perkins first suggested that the short posterior ciliary artery, with multiple branches and abundant peripheral blood vessels, facilitates the transport of tumor emboli to the posterior uvea. 30 This explains why the metastases impact the choroid more than the ciliary body or sclera. 31 The clinical manifestations of OM of GA include blurred vision, visual loss, flashes and floaters, metamorphopsia, and diplopia, among which metamorphopsia and vision loss are the most apparent. 32 When gastric cancer patients develop OM, the average survival time reduces from 25.4 months at initial diagnosis to only 3.3 months. 33 Therefore, early examination of OM is highly significant for the prognosis of patients with gastric cancer but remains challenging clinically. The application of imaging technology is limited by high expense and low sensitivity and specificity. Thus, low-cost and expedient serum tumor markers may be extensively utilized.
A meta-analysis has shown that a high-fat diet positively correlated with gastric cancer incidence. 34 In addition, evidence increasingly shows that hyperlipidemia is positively correlated with the metastasis of many types of cancer, including esophageal and gastric cancers.35,36 Hypercholesterolemia is dyslipidemia characterized by elevated levels of LDL cholesterol in the plasma. LDL is readily oxidized by reactive oxygen species, thereby enhancing lipid peroxidation in tissues. 37 In addition, evidence shows that oxidized low-density lipoprotein (ox-LDL) is a common pathogenic element in cancer metastasis. 38 For example, one study showed that an increase in oxLDL was linked with the development of cancer by binding lectin-like oxLDL receptor (LOX). 39 Recent studies have shown that oxLDL may up-regulate the expression of vascular endothelial growth factor-C and increase the secretion of gastric cancer cells through the LOX-1/nuclear factor kappa-B signaling pathway. In addition, the knockout of LOX-1 in HGC-27 gastric cancer cells with interference fragments and specific inhibitors can restrain the above functions of oxLDL. 40 The present study is consistent with the above findings, since LDL in patients with OM was significantly higher than in the NOM group. Our results show that LDL may be utilized as an independent risk factor for the early prediction of OM. Our study also shows GA patients may have malignant metastasis by absorbing cholesterol. Low HDL levels are positively correlated with increased cancer risk. In patients with rectal cancer, HDL levels were positively correlated with increased overall survival. 41 At the same time, preoperative serum HDL levels in patients with gallbladder cancer are also closely related to distant metastasis. 42 This is consistent with our study that HDL level is an indicator of reducing OM in GA patients.
CA724 is a molecular glycoprotein, mainly distributed in the stomach, breast, pancreas, and ovary. It is a tumor marker that is mainly used to detect GA and various gastrointestinal tumors. 43 CA724 was considered to be associated with Helicobacter pylori infection in China. 44 Another study suggested that the early stage of CA724 was related to vascular invasion of GA and a decrease in CA724 may lead to a better prognosis. 45 These findings suggest that CA724 can be used to evaluate distant metastasis in GA patients. CA125 is a glycoprotein that was detected as an epithelial ovarian cancer antigen in 1983 and binds to the monoclonal antibody OC125. 46 CA125 can also be detected in the serum of patients with gastrointestinal cancer. Most studies on the relationship between CA 125 and GA have shown that elevated serum CA 125 is associated with peritoneal metastasis. 47 CA125 usually plays a key role in the shedding and adhesion of GA cells from primary lesions to adjacent tissues, and is closely related to the invasion and metastasis of tumors. 48 In our study, the results also suggested that carbohydrate antigen family was a factor affecting OM in GA patients. CEA is the most commonly used tumor marker for GA. 49 In one study, the level of CEA in peritoneal fluid has been shown to be a reliable marker for early metastasis of GA. 50 In addition to the level of CEA in peritoneal fluid, serum CEA has been shown to be associated with hematogenous and lymphatic metastasis of GA. The results of meta-analysis also showed that pre-treatment serum CEA concentration >5 ng / mL was significantly associated with GA metastasis. 51 In addition, the combination of CEA with CA724, CA199, and CA125 can improve the diagnostic ability of GA metastasis. 52 AFP is an oncoprotein with a glycoprotein structure. In clinical practice, it may be elevated in many organs, especially in GA. 53 Studies have found that compared with AFP-negative GA, AFP-positive GA has higher proliferation activity, lower apoptosis, and abundant neovascularization, more prone to liver and lymph node metastasis, and poor prognosis. 54 He et al’s GA study on high AFP showed that serum AFP level was a prognostic factor for overall survival, which was highly significant. 55 In our study, abnormal elevation of AFP was also associated with OM in GA patients.
ML is an emerging medical field with the capacity to manage large, complicated and diverse data. It is the future of biomedical investigation and can promote global health care. 56 Using ML and survival analysis, Li et al found that differentially expressed mRNAs had potential diagnostic and prognostic value for gastric cancer. 57 In ophthalmology, a simple linear model has been utilized to predict advanced age-related macular degeneration. 58 The RF algorithm has been used to identify features that best predict the progression of geographic atrophy in age-related macular degeneration and to discover prognostic indicators of visual outcomes after intravitreal anti-vascular endothelial growth factor treatment. 59 Unlike traditional statistical models, ML models are powerful but are more complex and challenging to interpret due to their black box characteristics, limiting further clinical application. Therefore, we introduced SHAP to explain the GBM model. Based on the concept of Shapley value in game theory, SHAP can be rationalized as an extension of the unknowable interpretation method of the local interpretable model. 17 The SHAP method can explain the GBM model and its predictions, producing a feature value for a single prediction from any GBM model and revealing its black-box properties. 60 We compared six different ML models to forecast the risk probability of OM in patients with GA and compared the F1 score, sensitivity, specificity, AUC, accuracy, and other indicators. Finally, the best-performing model was identified. Based on this, we developed a web page calculator which clinicians may use to enter the patient's indicators and thus obtain their personalized prediction probability of OM, which can assist with targeted and precise preventive measures.
However, our study has some limitations. First, this research is single-center, it is difficult to conduct external validation, and the performance of ML may vary depending on the characteristics of patients in different regions. Second, this is a retrospective study, which we need to verify further through follow-up studies. Third, the database only records the original diagnosis of the patient and no follow-up data for further analysis. Also, there is a significant imbalance between the cohort of OM positive and negative patients using training and test sets. In our following study, we will obtain large samples from multiple centers to verify the robustness and repeatability of the model.
Conclusion
We used the ML method to establish a risk prediction model for OM in patients with GA and showed that the GBM model performed best among the six ML models. Establishing this prediction model is a step toward an auto-loading diagnostic system, which may help clinicians make an accurate diagnosis and apply appropriate prevention measures for OM in patients with GA.
Supplemental Material
sj-pdf-1-tct-10.1177_15330338231219352 - Supplemental material for Prediction Model of Ocular Metastases in Gastric Adenocarcinoma: Machine Learning-Based Development and Interpretation Study
Supplemental material, sj-pdf-1-tct-10.1177_15330338231219352 for Prediction Model of Ocular Metastases in Gastric Adenocarcinoma: Machine Learning-Based Development and Interpretation Study by Jie Zou, Yan-Kun Shen, Shi-Nan Wu, Hong Wei, Qing-Jian Li, San Hua Xu, Qian Ling, Min Kang, Zhao-Lin Liu, Hui Huang, Xu Chen, Yi-Xin Wang, Xu-Lin Liao, Gang Tan and Yi Shao in Technology in Cancer Research & Treatment
Footnotes
Author Contributions
All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Jie Zou, Yan-Kun Shen, and Shi-Nan Wu. Dr Yi Shao and Dr Gang Tan were the guarantors of integrity of the entire study. The first draft of the manuscript was written by Yan-Kun Shen, Shi-Nan Wu, Hong Wei, and Qing-Jian Li and all authors commented on previous versions of the manuscript. The statistical analysis was performed by San Hua Xu, Qian Ling, and Min Kang. Clinical data were collected by Zhao-Lin Liu, Hui Huang, and Xu Chen. Literature research was performed by Yi-Xin Wang and Xu-Lin Liao. All authors read and approved the final manuscript.
Availability of Data and Materials
The datasets used and/or analyzed during the present study are available from the corresponding author on reasonable request.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval and Consent to Participate
The study methods and protocols were approved by the Medical Ethics Committee of the First Affiliated Hospital of Nanchang University (Nanchang, China) and followed the principles of the Declaration of Helsinki. All subjects were notified of the objectives and content of the study and latent risks, and then provided written informed consent to participate.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Major (Key) R&D Program of Jiangxi Province, Jiangxi Province Double Thousand Plan Science and Technology Innovation High-end Talent Project (2022), Excellent Talents Development Project of Jiangxi Province, National Natural Science Foundation of China (grant number 20181 bbg70004, 20203BBG73059, 2022103, 2022, 20192BCBL23020, 82160195).
Foundation Item
National Natural Science Foundation (No: 82160195); Jiangxi Province Double Thousand Plan Science and Technology Innovation High-end Talent Project (2022); Major (Key) R&D Program of Jiangxi Province (No: 2022103; 20181 bbg70004; 20203BBG73059); Excellent Talents Development Project of Jiangxi Province (No.: 20192BCBL23020)
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
