Biomarker-based and interpretable machine learning framework for predicting pathological stage in gastric cancer: A retrospective analysis

Abstract

Background

Accurate preoperative staging of gastric cancer (GC) is essential for guiding treatment strategies. However, reliable noninvasive tools for distinguishing early-stage from advanced-stage GC remain limited.

Methods

This retrospective study enrolled 434 patients with GC. Eleven supervised machine learning algorithms were developed using preoperative laboratory parameters and engineered ratio features capturing inflammatory, metabolic, and tumor-related profiles. CatBoost showed superior performance and was selected for SHapley Additive exPlanations (SHAP)-based interpretation. A forward feature selection strategy identified an optimal nine-feature panel. Model performance was evaluated by area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1-score, with robustness validated through repeated 10-fold cross-validation and 1000 bootstrap iterations.

Results

Among 434 patients, 251 (57.8%) had stage I and 183 (42.2%) had stages II–III disease. Incorporating biologically informed ratio features significantly enhanced model performance; CatBoost's AUC improved from 0.802 to 0.981. SHAP-based selection yielded a compact, interpretable nine-feature model. The final CatBoost model achieved a mean AUC of 0.9499 (95% confidence interval (CI): 0.9421–0.9570), with high consistency across cross-validation folds. SHAP analysis identified uric acid (UA) and APTT as key predictors, and interaction analysis revealed stable multivariate relationships, supporting the model's biological plausibility.

Conclusions

We developed a robust, interpretable machine learning model for GC staging using routine blood tests and derived ratio features. The model demonstrated excellent discrimination, interpretability, and clinical utility, offering a practical tool for personalized risk stratification and treatment planning.

Keywords

Gastric cancer blood biomarkers machine learning models tumor staging model interpretability

Introduction

Gastric cancer (GC) remains one of the most prevalent and lethal malignancies globally, ranking as the fifth leading cause of cancer-related death worldwide.¹ Prognostic outcomes in GC are heavily stage-dependent, influencing both therapeutic strategies and survival expectations. Patients diagnosed with stage I GC have favorable outcomes, with 5-year survival rates exceeding 90%.^2–4 In contrast, survival drops markedly in advanced disease, with 5-year survival rates of approximately 60% to 70% for stage II and 20% to 40% for stage III.^5,6

According to current guidelines and clinical trial evidence,^7–12 endoscopic submucosal dissection is the preferred treatment for early-stage GC, aiming to preserve postoperative quality of life. For patients with stage I disease who have undergone curative resection, routine surveillance is typically sufficient. However, adjuvant chemotherapy is recommended following curative gastrectomy for stages II–III GC to improve both overall survival (OS) and disease-free survival (DFS).^13,14 Additionally, mounting evidence supports the use of neoadjuvant therapy in stages II–III GC, which has been shown to enhance resection rates and improve long-term outcomes.^15–19 These findings underscore the critical need for accurate staging at diagnosis to inform individualized treatment strategies and optimize both prognosis and quality of life.

Currently, preoperative staging of GC relies heavily on imaging modalities such as computed tomography (CT), endoscopic ultrasonography, and magnetic resonance imaging. However, these techniques have notable limitations in accuracy, invasiveness, radiation exposure, and cost.^20–23 For instance, Wu et al.²⁴ reported diagnostic accuracies of only 52.6% for stage T1 and 72.7% for stage T2 based on CT alone. Emerging research suggests that peripheral blood indicators may serve as accessible surrogates reflecting the tumor microenvironment.²⁵ Unlike imaging, conventional blood tests, including hematological, biochemical, coagulation, and tumor marker panels, offer a noninvasive, rapid, safe, and cost-effective alternative that can be routinely implemented in clinical practice.

In parallel, machine learning (ML) has rapidly advanced as a powerful tool in medicine, capable of uncovering complex patterns within multidimensional clinical data to support precise decision making.^26–28 Prior studies have demonstrated the utility of ML models based on routine blood indicators in accurately identifying cancers such as lung and colorectal cancer.^29–31 Nevertheless, few studies have explored the integration of ML algorithms with preoperative blood indicators for accurate GC staging, particularly in distinguishing early-stage (stage I) from more advanced disease (stages II–III).

Therefore, this study aimed to develop and validate an interpretable, noninvasive, convenient, and cost-efficient ML model that incorporates routine blood indicators, including complete blood counts, liver and kidney function tests, coagulation parameters, tumor markers, and derived ratio features, for preoperative GC staging. Specifically, the model seeks to differentiate stage I from stages II–III disease, thereby facilitating individualized clinical decision making, minimizing overtreatment in early-stage patients, and enabling timely adjuvant therapy in those with advanced-stage GC.

Design and method

Patients enrolled and data collection

Between 1 May 2020 and 31 December 2024, a total of 867 patients who underwent radical gastrectomy for histologically confirmed GC at the Department of General Surgery, Peking Union Medical College Hospital (PUMCH) were retrospectively reviewed. After applying predefined exclusion criteria (Figure 1), 433 patients were excluded for the following reasons: receipt of neoadjuvant therapy (n = 328), history of other malignancies in patients who were under any cancer-directed treatment within the 12 months or not in complete remission (n = 32), autoimmune or chronic inflammatory disorders (n = 25), age <18 or >90 years (n = 10), and incomplete clinical or pathological data (n = 38). The final cohort included 434 eligible patients, classified according to the eighth edition of the American Joint Committee on Cancer (AJCC) tumor node metastasis (TNM) staging system: 251 with stage I, 93 with stage II, and 90 with stage III disease. Preoperative clinical characteristics and laboratory parameters collected within 1 week prior to surgery were extracted from electronic medical records. These included complete blood counts, liver and renal function tests, serum electrolytes, coagulation profiles, and tumor markers. Mismatch match repair (MMR) immunohistochemistry (IHC): MMR status was evaluated on formalin-fixed paraffin-embedded sections using antibodies against MLH1, MSH2, MSH6, and PMS2. Deficient MMR (dMMR) was defined as complete loss of nuclear staining in tumor cells with preserved staining in internal controls (stromal or lymphoid cells), while proficient MMR (pMMR) was defined as intact nuclear expression of all four proteins. Surgical specimens of GC were classified as dMMR if either criterion was met: IHC demonstrated complete loss of nuclear expression of one or more MMR proteins (MLH1, PMS2, MSH2, MSH6) in tumor cells with intact internal controls (typical patterns: paired loss of MLH1/PMS2 or MSH2/MSH6; isolated PMS2 or isolated MSH6 loss was also considered dMMR).^32–34 All data were independently verified by two trained investigators. This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of PUMCH (approval no. I-23PJ2155). Written informed consent was obtained from all participants prior to inclusion.

Figure 1.

Patient selection flowchart. A total of 567 patients who underwent radical gastrectomy for gastric cancer at PUMCH between May 2020 and December 2024 were screened. After applying exclusion criteria, 434 patients were eligible for analysis and stratified by pathological stage. The dataset was subsequently split into training (80%) and test (20%) sets for machine learning model development and validation.

Data preprocessing and feature engineering

All data preprocessing and feature engineering procedures were conducted using Python (version 3.9). Raw clinical and laboratory variables were standardized into numerical format via regular expression parsing (pandas v1.5.3). Features with more than 40% missing values, namely serum magnesium (Mg), cancer antigen 125 (CA125), and corrected calcium (CorrCa), were excluded, resulting in 71 initial variables. Missing values in the remaining dataset were imputed using the K-nearest neighbors algorithm (KNNImputer, k = 5). To improve biological interpretability, 25 clinically meaningful ratio and composite features were constructed (Supplemental Table S4). These included: (a) tumor marker ratios: CEA/CA19-9, CA19-9/CA72-4, CA72-4/CEA, AFP/CEA, AFP/CA19-9, and AFP/CA72-4; (b) composite tumor marker load: calculated as the sum, mean, and standard deviation of CEA, CA19-9, CA72-4, and AFP; (c) inflammatory indices: neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), RDW-CV/HCT, and PLT/PDW; (d) Coagulation markers: APTT-R/prothrombin time (PT), FIB/APTT, and D-dimer/FIB; (e) biochemical ratios: ApoB/ApoA1, LDL-C/HDL-C, TG/HDL-C, Alb(BCG)/A/G, ALT/AST, and Scr/BUN; (f) electrolyte ratios: Na/K, Cl/Na, and Ca/P. Integration of these engineered features with the original variables yielded a total of 96 candidate predictors. Feature selection was performed using SHapley Additive exPlanations (SHAP, shap v0.44.1) based on an XGBoost model (xgboost v1.7.6). Predictors with a mean absolute SHAP value above the median were retained, resulting in a final feature set of 47 variables. To address class imbalance, the Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors (SMOTEEN, imblearn v0.10.1) was applied. All features were standardized using Z-score normalization. The dataset was then stratified into training and testing subsets in an 80:20 ratio for downstream ML model development.

Machine learning model development and evaluation

To develop a robust model for predicting pathological stage (stage I vs. stages II–III) in patients with GC, 11 supervised ML algorithms were systematically evaluated. These included: XGBoost (xgboost v1.7.6), LightGBM (lightgbm v3.3.5), CatBoost (catboost v1.2), support vector machine (SVM), k-nearest neighbors (KNN), decision tree, random forest, logistic regression, multilayer perceptron (MLPClassifier), AdaBoost, and ExtraTrees (all implemented using scikit-learn v1.2.2). Model development was carried out in three sequential stages to ensure clarity and reproducibility. Stage 1: All 96 available predictors, including 71 raw clinical/laboratory features and 25 engineered ratio or composite features, were used to train the full panel of classifiers for baseline performance comparison. Stage 2: Feature importance was assessed using SHAP derived from the XGBoost model. The top 50% of predictors, ranked by mean absolute SHAP value, were retained for model refinement (n = 47). Stage 3: The reduced feature set (n = 47) was used for final model training, hyperparameter tuning, and evaluation. To address class imbalance, the SMOTEEN algorithm (imblearn v0.10.1) was applied. All features were standardized using Z-score normalization. Models were trained on the resampled training set and evaluated on an independent 20% hold-out test set. Evaluation metrics included area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1-score. Receiver operating characteristic (ROC) curves and performance summary plots were generated using matplotlib (v3.7.1) and seaborn (v0.12.2). Among all classifiers, CatBoost consistently demonstrated the highest overall predictive performance following the incorporation of engineered ratio features. It was therefore selected as the optimal model for downstream interpretability analysis using SHAP.

Feature selection, model optimization, and validation

Feature selection was guided by SHAP, which ranked input variables according to their mean absolute contribution to the CatBoost model. A cumulative contribution curve was generated to assess the proportion of total model explanatory power captured by the highest-ranking features. To determine the optimal feature subset, a stepwise evaluation was performed using incrementally increasing numbers of top-ranked features (from 5 to 30) based on SHAP importance. For each subset, data preprocessing involved Z-score normalization and class rebalancing using the SMOTEEN algorithm, followed by an 80:20 stratified train–test split to maintain class distribution and ensure sufficient sample sizes for training and evaluation. Each candidate model was trained using CatBoost with fixed hyperparameters (iterations = 500, learning_rate = 0.05, depth = 6) and evaluated across five performance metrics: AUC, accuracy, precision, recall, and F1-score. The subset yielding the highest AUC was selected as the final model, resulting in a nine-feature classifier that achieved a favorable balance between predictive performance and model simplicity. To comprehensively evaluate model robustness and generalizability, two complementary validation strategies were employed. First, repeated stratified 10-fold cross-validation (10 repetitions; 100 folds in total) was used to estimate performance variability across random splits. The mean ROC curve and its 95% confidence band were plotted to visualize model stability. Second, nonparametric bootstrap resampling (1000 iterations) was applied to derive the empirical distribution of AUC and compute its 95% confidence interval. All analyses were performed in Python 3.9 using pandas, catboost, shap, scikit-learn, and imblearn, with a fixed random seed (random_state = 42) to ensure reproducibility.

Model explainability via SHAP analysis

To enable transparent interpretation of the final CatBoost model, SHAP was applied using the model-specific TreeExplainer. SHAP values were computed on the independent test set to quantify each feature's marginal contribution to the predicted probability of advanced-stage disease (stages II–III vs. stage I). Global importance was assessed through summary bar plots and distribution visualizations based on mean absolute SHAP values, revealing both the relative influence and directional impact of key predictors. For individualized interpretation, SHAP force plots were generated to attribute case-level risk to specific features, enabling clinically interpretable stratification. In addition, SHAP interaction values were used to capture nonlinear dependencies among features, and a pairwise interaction heatmap was constructed across the top nine predictors to visualize synergistic or antagonistic effects. This multilevel SHAP framework provided both global and local model interpretability, facilitating biological plausibility assessments and enhancing the transparency of ML-assisted gastric cancer staging. All analyses were implemented using the shap Python package (v0.44.1).

Statistical analysis

Statistical analyses were performed using SPSS (version 26.0; IBM Corp., Armonk, NY, USA) and Python (version 3.9). The distributional normality of continuous variables was assessed using the Kolmogorov–Smirnov test. Variables conforming to normal distribution were expressed as mean ± standard deviation and compared using independent samples t-tests; non-normally distributed variables were presented as median with interquartile range (IQR) and compared using the Mann–Whitney U test. Categorical variables were summarized as counts and proportions and evaluated using the chi-square test or Fisher's exact test, as appropriate. To identify statistically significant differences between stage I and stages II–III GC, group-wise comparisons were performed for both clinical and laboratory variables. Subgroup analyses across all three TNM stages (I, II, III) were also conducted for exploratory purposes, with detailed results provided in the Supplemental Tables. A two-sided p-value < 0.05 was considered statistically significant. All data preprocessing, visualization, and ML analyses were conducted in Python using the following libraries: pandas (v1.5.3), numpy (v1.23.5), matplotlib (v3.7.1), seaborn (v0.12.2), and scikit-learn (v1.2.2).

Results

Baseline characteristics of the study population

A total of 434 patients with histologically confirmed GC were included in the final analysis, comprising 251 (57.8%) with pathological stage I and 183 (42.2%) with stages II–III disease. Detailed demographic, clinical, and pathological characteristics are summarized in Table 1, and baseline laboratory parameters are presented in Table 2. Significant differences in clinical and pathological features were observed between stage I and stages II–III groups. Patients with advanced-stage disease were more likely to have a family history of cancer, undergo open surgery, and present with poorly differentiated tumors. Diffuse or mixed Lauren classification, presence of vascular tumor emboli, and perineural invasion were also more prevalent in stages II–III patients. Moreover, MMR status (deficiency vs. proficiency), evaluated by immunohistochemistry for MLH1, MSH2, MSH6, and PMS2, and negative EBER-ISH status were significantly different between groups. Overall, dMMR was identified in 25 of 434 tumors (5.8%). There was no difference (p = 0.52) in dMMR rate between stage I (16/251, 6.4%) and stages II–III (9/183, 4.9%). The absolute number of dMMR tumors was small (n = 25) and the study might be underpowered to detect stage-wise small differences in dMMR prevalence. However, this trend is consistent with prior literature showing that MSI-H/dMMR GCs tend to present at earlier pathological stages.^35–37 We will expand the sample size in future studies to further explore the relationship between dMMR rate and GC staging. As expected, T and N staging categories differed significantly between groups, confirming progressive pathological burden. In terms of preoperative laboratory indicators, several markers showed statistically significant stage-related differences. Compared with stage I, patients with stages II–III GC exhibited significantly lower levels of prealbumin, albumin, UA, hemoglobin, HCT, MCH, and MCHC. In contrast, fibrinogen, APTT, APTT ratio, thrombin time, and RDW-CV were significantly elevated in advanced-stage disease. Additionally, ChE levels differed markedly between groups, potentially reflecting nutritional and metabolic alterations associated with tumor progression. Further subgroup analyses (Supplemental Tables S1–S3) revealed that differences in laboratory parameters were most pronounced between stage I and stage III, including lower UA, prealbumin, and ChE, alongside higher fibrinogen and APTT in stage III. Thus, stage I gastric cancers displayed distinct hematologic, metabolic, and nutritional profiles compared with stage II disease, whereas stages II and III were largely comparable, differing only in fibrinogen levels. These patterns suggest marked biological separation between early- and advanced-stage disease, but relative homogeneity across advanced stages.

Table 1.

Comparison of basic clinical characteristics in stages I to III GC groups.

Variables	Groups (%)			p-Value
Variables	Stage I (n = 251)	Stage Ⅱ (n = 93)	Stage Ⅲ (n = 90)	p-Value
Gender, n (%)
Female	81 (32.27)	29 (31.18)	39 (43.33)	0.128
Male	170 (67.73)	64 (68.82)	51 (56.67)	0.128
Age (years)	60.000 (55.0,67.0)	62.000 (55.0,68.0)	60.000 (50.0,68.0)	0.52
Body mass index (kg/m²)	24.112 (22.1,25.9)	24.221 (21.8,26.0)	23.227(21.0,26.3)	0.536
Smoking history, n (%)
No	135 (53.78)	52 (55.91)	60 (66.67)	0.104
Yes	116 (46.22)	41 (44.09)	30 (33.33)	0.104
Drinking history, n (%)
No	149 (59.36)	62 (66.67)	63 (70.00)	0.145
Yes	102 (40.64)	31 (33.33)	27 (30.00)	0.145
Diabetes, n (%)
No	207 (82.47)	74 (79.57)	77 (85.56)	0.567
Yes	44 (17.53)	19 (20.43)	13 (14.44)	0.567
Personal history of other cancers, n (%)
No	219 (87.25)	86 (92.47)	80 (88.89)	0.396
Yes	32 (12.75)	7 (7.53)	10 (11.11)	0.396
Family history of cancer, n (%)
No	202 (80.48)	69 (74.19)	80 (88.89)	0.040*
Yes	49 (19.52)	24 (25.81)	10 (11.11)	0.040*
Tumor location, n (%)
Fundus	31 (12.35)	17 (18.28)	9 (10.00)	0.492
Corpus	45 (17.93)	18 (19.35)	17 (18.89)
Antrum and pylorus	175 (69.72)	58 (62.37)	64 (71.11)
Surgical approach, n (%)
Open	1 (0.40)	2 (2.15)	4 (4.44)	0.029*
Laparoscopy	250 (99.60)	91 (97.85)	86 (95.56)	0.029*
Surgical procedure, n (%)
Subtotal gastrectomy	215 (75.66)	73 (78.49)	74 (82.22)	0.268
Total gastrectomy	36 (14.34)	20 (21.51)	16 (17.78)	0.268
Differentiation, n (%)
Unknown	14 (5.58)	3 (3.23)	2 (2.22)	＜0.001***
High	24 (9.56)	2 (2.15)	0 (0.00)
Moderate	57 (22.71)	15 (16.13)	9 (10.00)
Low	156 (62.15)	73 (78.49)	79 (87.78)
Lauren type, n (%)
Diffuse type	64 (25.50)	27 (29.03)	33 (36.67)	0.009**
Intestine type	102 (40.64)	25 (26.88)	18 (20.00)
Mixed type	55 (21.91)	26 (27.96)	29 (32.22)
Unclassified type	30 (11.95)	15 (16.13)	10 (11.11)
Vascular tumor embolus, n (%)
No	217 (86.45)	52 (55.91)	30 (33.33)	＜0.001***
Yes	34 (13.55)	41 (44.09)	60 (66.67)	＜0.001***
Perineural invasion, n (%)
No	237 (94.42)	54 (58.06)	39 (43.33)	＜0.001***
Yes	14 (5.58)	39 (41.94)	51 (56.67)	＜0.001***
MMR, n (%)
pMMR	235 (93.63)	86 (92.47)	88 (97.78)	0.336
dMMR	16 (6.37)	7 (7.53)	2 (2.22)
MLH-1, n (%)
Unknown	37 (14.74)	2 (2.15)	2 (2.22)	＜0.001***
Positive	198 (78.88)	84 (90.32)	86 (95.56)
Negative	16 (6.37)	7 (7.53)	2 (2.22)
MSH-2, n (%)
Unknown	37 (14.74)	2 (2.15)	2 (2.22)	＜0.001***
Positive	212 (84.46)	91 (97.85)	88 (97.78)
Negative	2 (0.80)	0 (0.00)	0 (0.00)
MSH-6, n (%)
Unknown	37 (14.74)	2 (2.15)	2 (2.22)	＜0.001***
Positive	210 (83.67)	91 (97.85)	88 (97.78)
Negative	4 (1.59)	0 (0.00)	0 (0.00)
PMS-2, n (%)
Unknown	37 (14.74)	2 (2.15)	2 (2.22)	＜0.001***
Positive	198 (78.88)	84 (90.32)	86 (95.56)
Negative	16 (6.37)	7 (7.53)	2 (2.22)
Her-2, n (%)
Negative	83 (38.07)	46 (50.55)	40 (44.94)	0.168
Low-positive	77 (35.32)	22 (24.18)	30 (33.71)
Moderate-positive	50 (22.94)	17 (18.68)	13 (14.61)
High-positive	8 (3.67)	6 (6.59)	6 (6.74)
EBER-ISH, n (%)
Unknown	59 (23.51)	6 (6.45)	8 (8.89)	＜0.001***
Positive	12 (4.78)	3 (3.23)	2 (2.22)
Negative	180 (71.71)	84 (90.32)	80 (88.89)
T stage, n (%)
1	208 (82.87)	12 (12.90)	0 (0.00)	＜0.001***
2	43 (17.13)	21 (22.58)	14 (15.56)
3	0 (0.00)	52 (55.91)	26 (28.89)
4	0 (0.00)	8 (8.60)	50 (55.56)
N stage, n (%)
0	231 (92.03)	36 (38.71)	2 (2.22)	＜0.001***
1	20 (7.97)	35 (37.63)	10 (11.11)
2	0 (0.00)	19 (20.43)	23 (25.56)
3	0 (0.00)	3 (3.23)	55 (61.11)

Data are presented as mean ± SD or median (interquartile range) for continuous variables, and n (%) for categorical variables. p-Values were calculated using χ² test or Fisher's exact test for categorical variables and Student's t test or Mann–Whitney U test for continuous variables, as appropriate. *p < 0.05, *p < 0.01, ***p < 0.001.

dMMR: deficient mismatch repair proteins; EBER-ISH: Epstein-Barr Virus-encoded RNA in situ hybridization; GC: gastric cancer; pMMR: proficient MMR.

Table 2.

Comparison of baseline of blood indicators of stages I to III GC groups.

Variables	Groups			p-Value
Variables	Stage I (n = 251)	Stage Ⅱ (n = 93)	Stage Ⅲ (n = 90)	p-Value
Lymphocytes percentage (%)	31.30 ± 8.28	31.75 ± 8.29	31.73 ± 8.40	0.858
Neutrophil percentage (%)	58.71 ± 8.72	57.91 ± 8.73	58.42 ± 8.64	0.751
Prealbumin (mg/L)	256.23 ± 45.92	247.53 ± 47.48	241.85 ± 49.42	0.041*
Total carbon dioxide (mmol/L)	27.73 ± 2.44	28.01 ± 2.20	27.63 ± 2.56	0.529
Inorganic phosphorus (mmol/L)	1.20 ± 0.16	1.20 ± 0.18	1.23 ± 0.15	0.447
Total cholesterol (mmol/L)	4.75 ± 1.01	4.55 ± 0.99	4.55 ± 1.12	0.15
Low-density lipoprotein cholesterol (mmol/L)	2.93 ± 0.87	2.80 ± 0.87	2.76 ± 0.88	0.258
Nonhigh-density lipoprotein cholesterol (mmol/L)	3.66 ± 0.98	3.34 ± 0.91	3.59 ± 1.00	0.064
Prothrombin activity (%)	100.82 ± 10.08	100.42 ± 8.39	98.89 ± 9.34	0.262
White blood cells (10⁹/L)	5.660 (4.8,6.6)	5.480 (4.6,6.4)	5.345 (4.4,6.5)	0.057
Monocytes (%)	6.000 (5.2,7.2)	6.100 (5.2,6.9)	5.900 (4.9,7.4)	0.745
Eosinophiles (%)	1.900 (1.3,3.0)	2.300 (1.4,3.9)	2.000 (1.2,2.8)	0.153
Basophiles (%)	0.500 (0.3,0.6)	0.500 (0.3,0.7)	0.500 (0.3,0.6)	0.964
Lymphocytes (10⁹/L)	1.740 (1.4,2.1)	1.670 (1.3,2.1)	1.630 (1.3,2.0)	0.317
Monocytes (10⁹/L)	0.350 (0.3,0.4)	0.320 (0.3,0.4)	0.310 (0.3,0.4)	0.049*
Neutrophile (10⁹/L)	3.260 (2.7,4.1)	3.130 (2.4,3.8)	3.105 (2.4,3.9)	0.136
Eosinophiles (10⁹/L)	0.110 (0.1,0.2)	0.110 (0.1,0.2)	0.100 (0.1,0.2)	0.359
Basophiles (10⁹/L)	0.030 (0.0,0.0)	0.030 (0.0,0.0)	0.020 (0.0,0.0)	0.406
Red blood cells (10¹²/L)	4.550 (4.3,4.8)	4.430 (4.1,4.8)	4.330 (4.1,4.7)	0.002**
Hemoglobin (g/L)	141.000 (130.0,149.0)	131.000 (119.5,145.0)	133.000 (116.5,143.0)	＜0.001***
Hematocrit (%)	41.700 (38.7,43.9)	39.600 (36.3,43.5)	40.150 (35.4,42.6)	＜0.001***
Mean corpuscular volume (fL)	90.900 (87.7,93.8)	90.200 (85.7,93.2)	90.300 (86.3,93.0)	0.187
Mean corpuscular hemoglobin concentration (g/L)	337.000 (329.0,345.0)	334.000 (322.0,339.5)	333.000 (324.0,342.0)	＜0.001***
Mean corpuscular hemoglobin (pg)	30.600 (29.6,31.7)	30.200 (28.6,31.3)	30.350 (28.2,31.6)	0.012*
Red cell volume distribution width-SD (fL)	41.500 (39.7,43.5)	41.850 (39.6,43.6)	42.400 (40.3,44.7)	0.104
Red cell volume distribution width-CV (%)	12.900 (12.3,13.7)	13.100 (12.4,14.3)	13.250 (12.7,14.3)	0.024*
Platelets (10⁹/L)	205.000 (175.0,242.0)	202.000 (186.5,250.5)	213.000 (173.8,262.0)	0.523
Plateletcrit (%)	0.200 (0.2,0.2)	0.200 (0.2,0.3)	0.200 (0.2,0.2)	0.723
Platelet distribution width (fL)	12.800 (10.9,47.7)	12.200 (10.3,48.4)	13.050 (10.7,47.2)	0.545
Mean platelet volume (fL)	9.900 (9.1,10.5)	9.700 (9.1,10.3)	9.800 (9.2,10.4)	0.315
Platelet-large cell ratio (%)	25.750 (21.5,30.4)	23.250 (19.9,28.1)	24.400 (20.8,28.3)	0.194
Alanine aminotransferase (U/L)	17.000 (13.0,24.0)	15.000 (11.0,23.0)	15.000 (11.0,21.3)	0.065
Total protein (g/L)	67.000 (64.0,70.8)	65.500 (62.0,71.0)	66.000 (63.0,71.0)	0.209
Albumin (g/L)	42.000 (40.0,44.0)	41.000 (38.0,44.0)	41.000 (39.0,43.3)	0.040*
Albumin globulin ratio	1.700 (1.5,1.9)	1.600 (1.5,1.9)	1.700 (1.5,1.8)	0.411
Total bilirubin (µmol/L)	11.750 (9.5,15.4)	11.600 (8.3,14.9)	11.350 (8.3,15.0)	0.2
Direct bilirubin (µmol/L)	3.550 (2.8,4.8)	3.500 (2.7,4.7)	3.450 (2.6,4.6)	0.55
Glutamyl transpeptidase (U/L)	21.000 (15.0,29.0)	19.000 (14.0,25.0)	17.000 (13.0,27.0)	0.085
Alkaline phosphatase (U/L)	70.000 (61.0,85.0)	71.000 (58.8,86.0)	71.000 (60.0,82.0)	0.986
Aspartate aminotransferase (U/L)	18.000 (16.0,23.0)	18.000 (15.0,23.0)	18.000 (15.0,23.0)	0.582
Total bile acid (umol/L)	3.300 (2.1,5.1)	2.950 (1.7,5.3)	3.000 (2.1,5.2)	0.647
Lactic dehydrogenase (U/L)	168.000 (152.0,188.0)	165.000 (146.5,191.3)	168.000 (149.0,201.0)	0.601
Cholinesterase (kU/L)	7.300 (6.6,8.3)	6.900 (5.8,7.7)	6.800 (5.9,7.7)	＜0.001***
Potassium (mmol/L)	4.100 (3.8,4.3)	4.000 (3.8,4.2)	4.000 (3.8,4.3)	0.609
Sodium (mmol/L)	141.000 (140.0,142.0)	141.000 (140.0,142.0)	141.000 (140.0,142.0)	0.287
Chlorine (mmol/L)	106.000 (104.0,107.0)	106.000 (104.0,107.0)	106.000 (104.0,107.0)	0.967
Calcium (mmol/L)	2.290 (2.2,2.4)	2.260 (2.2,2.3)	2.275 (2.2,2.3)	0.074
Creatinine (µmol/L)	71.000 (62.0,79.0)	71.000 (62.0,78.5)	67.000 (58.8,79.5)	0.32
Urea (mmol/L)	5.210 (4.3,6.3)	5.120 (4.1,6.1)	4.815 (4.1,5.7)	0.073
Glucose (mmol/L)	5.200 (4.8,5.8)	5.100 (4.8,6.0)	5.300 (4.9,5.7)	0.719
Uric acid (µmol/L)	328.000 (265.0,375.0)	296.500 (246.0,353.3)	292.000 (253.0,324.0)	＜0.001***
Triglycerides (mmol/L)	1.350 (0.9,1.8)	1.165 (0.8,1.7)	1.180 (0.9,1.6)	0.071
High-density lipoprotein cholesterol (mmol/L)	1.070 (0.9,1.3)	1.100 (0.9,1.3)	1.060 (0.9,1.2)	0.584
Apolipoprotein A1 (g/L)	1.250 (1.1,1.4)	1.265 (1.1,1.4)	1.260 (1.1,1.4)	0.749
Apolipoprotein B (g/L)	0.940 (0.8,1.1)	0.890 (0.7,1.1)	0.885 (0.7,1.1)	0.174
Lipoprotein (a) (mg/L)	91.000 (42.0,189.5)	100.500 (47.0,222.5)	114.000 (56.0,229.0)	0.501
C-reactive protein (hypersensitivity) (mg/L)	0.910 (0.4,1.9)	0.900 (0.5,1.8)	1.200 (0.6,2.6)	0.228
Free fatty acids (µmol/L)	400.000(283.5,556.0)	416.000 (294.3,588.3)	445.000 (289.0,629.3)	0.491
Prothrombin time (s)	11.400 (11.0,11.9)	11.400 (11.1,11.8)	11.600 (11.1,12.0)	0.155
International standardized ratio	0.970 (0.9,1.0)	0.960 (0.9,1.0)	0.980 (0.9,1.0)	0.104
Fibrinogen (g/L)	2.640 (2.3,3.0)	2.630 (2.4,3.1)	2.840 (2.4,3.4)	0.012*
Activated partial thromboplastin time (s)	26.800 (25.4,28.1)	25.900 (24.9,27.4)	26.050 (25.0,27.4)	0.003**
Activated partial thromboplastin time ratio	0.990 (0.9,1.0)	0.960 (0.9,1.0)	0.960 (0.9,1.0)	0.003**
Thrombin time (s)	16.800 (16.4,17.4)	16.700 (16.2,17.3)	16.550 (16.0,17.1)	0.009**
D-dimer (mg/L FEU)	0.250 (0.2,0.4)	0.315 (0.2,0.5)	0.290 (0.2,0.6)	0.195
CA242 (U/mL)	6.400 (4.0,12.4)	8.700 (5.0,16.2)	8.000 (3.9,12.4)	0.135
CA19-9 (U/mL)	9.350 (6.7,15.6)	9.500 (6.3,16.4)	10.300 (7.2,15.9)	0.591
CA72-4 (U/mL)	3.350 (2.1,7.3)	4.000 (2.1,7.2)	4.100 (1.8,9.6)	0.648
CEA (ng/mL)	1.800 (1.3,2.7)	2.100 (1.2,3.6)	1.700 (1.0,3.2)	0.212
AFP (ng/mL)	2.400 (1.9,3.6)	2.300 (1.8,3.5)	2.500 (1.9,4.0)	0.515

AFP: alpha-fetoprotein; CA19-9: carbohydrate antigen 19-9; CA242: carbohydrate antigen 242; CA72-4: carbohydrate antigen 72-4; CEA: carcinoembryonic antigen.

Performance of machine learning models without and with ratio features

Baseline benchmarking using the original preoperative biomarkers revealed that ensemble-based classifiers outperformed simpler algorithms in distinguishing early- from advanced-stage GC. ExtraTrees (AUC = 0.812), CatBoost (AUC = 0.802), and MLPClassifier (AUC = 0.802) achieved the highest discriminative capacity, whereas models such as KNN (AUC = 0.686), Decision Tree (AUC = 0.676), and AdaBoost (AUC = 0.663) showed limited predictive value (Figure 2(A) and (B)). These findings emphasize the advantage of advanced algorithms in capturing complex, nonlinear patterns embedded within raw clinical variables. To enhance predictive informativeness and capture interaction effects overlooked by single-variable inputs, a set of biologically relevant ratio features, such as RDW/HCT, AFP/CEA, TG/HDL, and APTT/PT, was engineered and integrated alongside the original biomarkers. This augmentation led to marked and consistent gains in classification performance across all tested algorithms (Figure 2(D) and (E)). The improvement was most pronounced for CatBoost, whose AUC rose from 0.802 to 0.981, accuracy from 0.657 to 0.885, and F1-score from 0.657 to 0.876 (Table 3). Similar yet slightly smaller gains were observed for XGBoost (AUC = 0.767 → 0.925) and LightGBM (AUC = 0.742 → 0.931), while even relatively low-performing models such as SVM (AUC = 0.735 → 0.856) and KNN (AUC = 0.686 → 0.850) benefited substantially, illustrating the broad applicability of these engineered ratios. Model interpretability analyses further clarified the impact of feature enhancement. In the raw-feature setting, top contributors included UA, APTT, CEA, and CA242, consistent with established roles for metabolic, coagulative, and tumor-associated pathways in GC progression (Figure 2(C)). Following ratio feature integration, composite indices such as RDW/HCT and AFP/CEA surpassed many traditional markers in predictive importance, although UA and CA242 retained their high rankings (Figure 2(F)), underscoring their persistent relevance for stage differentiation. Robustness was assessed through sensitivity analyses across multiple training–test partitions (80:20, 70:30, 60:40) and resampling strategies (SMOTE-ENN vs. no resampling). As shown in Supplemental Figure S1, SMOTE-ENN application yielded consistent performance gains in all configurations, with CatBoost achieving the highest AUCs, up to 0.981 in the 80:20 split. In contrast, omitting oversampling resulted in notable performance declines, illustrating the necessity of addressing class imbalance during model development.

Table 3.

Performance of machine learning classifiers with raw features alone versus combined raw and engineered ratio features.

Model	Original features					Original + ratio features
Model	AUC + 95% CI	Accuracy	Precision	Recall	F1-score	AUC + 95% CI	Accuracy	Precision	Recall	F1-score
XGBoost	0.767(0.675–0.847)	0.706	0.706	0.706	0.706	0.925(0.794–1.000)	0.885	0.876	0.888	0.88
LightGBM	0.742(0.648–0.828)	0.676	0.678	0.676	0.676	0.931(0.812–1.000)	0.846	0.854	0.819	0.83
CatBoost	0.802(0.717–0.875)	0.657	0.657	0.657	0.657	0.981(0.927–1.000)	0.885	0.886	0.869	0.876
SVM	0.735(0.642–0.825)	0.676	0.677	0.676	0.676	0.856(0.674–1.000)	0.846	0.839	0.856	0.842
KNN	0.686(0.584–0.790)	0.647	0.651	0.647	0.645	0.85(0.688–0.973)	0.692	0.683	0.638	0.639
DecisionTree	0.676(0.583–0.765)	0.676	0.677	0.676	0.676	0.712(0.538–0.882)	0.692	0.702	0.712	0.69
RandomForest	0.774(0.687–0.853)	0.696	0.696	0.696	0.696	0.962(0.876–1.000)	0.923	0.919	0.919	0.919
LogisticRegression	0.749(0.652–0.839)	0.725	0.727	0.725	0.725	0.8(0.603–0.950)	0.769	0.762	0.775	0.764
MLPClassifier	0.802(0.715–0.882)	0.735	0.735	0.735	0.735	0.838(0.644–1.000)	0.846	0.839	0.856	0.842
AdaBoost	0.663(0.549–0.758)	0.598	0.598	0.598	0.598	0.881(0.732–0.982)	0.808	0.797	0.806	0.8
ExtraTrees	0.812(0.726–0.883)	0.706	0.709	0.706	0.705	0.953(0.853–1.000)	0.923	0.917	0.938	0.921

Note: Models were trained with 71 original features or with 71 + 25 engineered ratio features (total = 96 predictors). Features were filtered using SHAP-based selection. AUC values are shown with bootstrapped 95% CIs (1000 iterations); other metrics are point estimates on the test set.

AUC: area under the curve; CI: confidence interval; ExtraTrees: Extra Trees Classifier; KNN: K-Nearest Neighbors; LightGBM: Light Gradient Boosting Machine; MLP: multilayer perceptron; SHAP: SHhapley Additive exPlanations; SVM: Support Vector Machine.

Figure 2.

Comparative performance of machine learning models and SHAP-based feature selection. (A) ROC curves of eleven ML models using SHAP-selected features; (B) comparisons of AUC, accuracy, precision, recall, and F1-score in all models; (C) SHAP feature importance (top 20) derived from original indicators; (D) ROC curves after incorporating ratio features; (E) performance metrics of models with ratio features added; (F) SHAP importance (top 20) based on the final input set combining original and ratio features. AUC: area under the curve; ML: machine learning; ROC: receiver operating characteristic; SHAP: SHhapley Additive exPlanations.

SHAP-based feature optimization and model validation

To identify the optimal subset of predictive features, SHAP values from the full CatBoost model were ranked by mean absolute contribution. Cumulative SHAP analysis demonstrated that the top 30 features explained 61.3% of total model contribution, with a plateauing trend beyond this point, suggesting diminishing returns from lower-ranked variables (Figure 3(A)). To assess how feature inclusion impacted model performance, CatBoost models were retrained using incremental subsets of the top 30 features. As shown in Figure 3(B), model performance, measured by AUC, accuracy, precision, recall, and F1-score, improved rapidly with the first few features and peaked when using the top nine. Beyond this point, performance oscillated, and further inclusion of features did not yield consistent gains. The 9-feature model achieved the best tradeoff, with an AUC of 0.981, accuracy of 0.885, precision of 0.886, recall of 0.869, and F1-score of 0.876. Robustness and generalizability of this 9-feature model were rigorously assessed through two complementary strategies. First, repeated stratified 10-fold cross-validation (30 iterations; 300 total folds) showed consistently high performance, with a mean AUC of 0.9499 ± 0.0645 and narrow standard deviation across folds (Figure 3(C)). Second, nonparametric bootstrap resampling (n = 2000) yielded an approximately normal AUC distribution with a mean of 0.9499 and a 95% confidence interval of 0.9421 to 0.9571, further confirming model stability (Figure 3(D)). Collectively, these findings demonstrate that SHAP-informed feature optimization enabled the construction of a parsimonious yet high-performing CatBoost model. Its reproducible performance across cross-validation and bootstrap testing supports its reliability for preoperative stage prediction in GC.

Figure 3.

SHAP-based feature selection and validation performance of the final CatBoost model. (A) Cumulative SHAP value contribution curve demonstrating feature-wise additive importance. (B) Performance metrics of the CatBoost model (AUC, accuracy, precision, recall, and F1-score) plotted against increasing number of SHAP-ranked features. (C) ROC curve derived from repeated 10-fold cross-validation (30 repeats), showing robust predictive stability; error bars denote ±1 SD across repeats (mean AUC = 0.9499 ± 0.0645). (D) Bootstrap-derived distribution of AUC values from 2000 iterations, confirming high reproducibility; shaded area indicates the 95% CI (mean AUC = 0.9499; 95% CI: 0.9424–0.9571). . AUC: area under the curve; CI: confidence interval; ROC: receiver operating characteristic; SHAP: SHhapley Additive exPlanations.

Model interpretability based on SHAP analysis

To elucidate the decision-making process of the final CatBoost model and enhance clinical interpretability, SHAP analysis was performed. Ranking features by mean absolute SHAP value identified APTT, UA, RDW/HCT, and Eos% as the most influential predictors (Figure 4(A)). The SHAP summary plot (Figure 4(B)) demonstrated consistent contribution patterns: reduced APTT and UA increased the predicted probability of stages II–III disease, whereas elevated RDW/HCT and Eos% similarly favored advanced-stage classification. These trends align with established associations linking coagulation activation, metabolic depletion, erythropoietic dysregulation, and systemic inflammation to tumor progression. At the patient level, SHAP waterfall plots (Figure 4(C)) illustrated how individual predictions arise from the interplay of additive and counteracting feature effects. In one representative case predicted as stages II to III, an elevated AFP/CEA ratio, prolonged APTT, and increased ChE acted as primary risk enhancers, while higher UA and Eos% exerted protective effects. This combination reflects the interaction between tumor burden, hepatic function, coagulation status, and immune activity, the core domains within the host–tumor interface. Further exploration of SHAP decision plots and interaction maps revealed that top-ranked features did not operate in a purely additive manner. Instead, nonlinear, context-dependent interactions were evident. For instance, the predictive impact of APTT was amplified in patients with low UA, suggesting a compounded risk from concurrent coagulopathy and metabolic insufficiency. Likewise, the influence of RDW/HCT was greater when Eos% was elevated, implicating inflammation-driven hematologic remodeling. These relationships were corroborated by SHAP interaction heatmaps (Figure 4(D)), which showed strong self-interactions for APTT and UA, as well as moderate cross-interactions such as APTT–UA and RDW/HCT–Eos%. Validation across the broader cohort confirmed these patterns. As shown in Supplementary Figure S2, extreme values of APTT, RDW/HCT, and Eos% consistently corresponded to advanced-stage predictions, whereas elevated UA, ChE, and MCH were more frequently associated with early-stage classification. Collectively, the SHAP interpretability framework demonstrates that the model integrates coagulation imbalance, oxidative metabolism, erythropoiesis, and tumor marker burden into a cohesive and mechanistically grounded approach for stage stratification in gastric cancer.

Figure 4.

SHAP-based interpretability analysis of the final CatBoost model. (A) Mean SHAP values for the top 9 features ranked by global importance; (B) SHAP summary plot illustrating feature effect directions and value distributions; (C) SHAP waterfall plot for a representative stages II–III patient showing individualized feature contributions to the final prediction (random sample); (D) SHAP interaction heatmap quantifying mean pairwise feature interaction strengths among the top 9 predictors. SHAP: SHhapley Additive exPlanations.

Discussion

In this retrospective cohort study, an interpretable ML model was developed and validated to differentiate early-stage (stage I) from advanced-stage (stages II–III) GC using routinely collected preoperative blood biomarkers. Among 11 algorithms evaluated, CatBoost demonstrated the most robust discriminative performance, with the AUC increasing from 0.802 (raw features) to 0.981 following the incorporation of biologically informed ratio features. SHAP analysis identified a parsimonious panel of nine top-ranking features, enabling accurate predictions while maintaining clinical interpretability. These results highlight the potential of integrating engineered biomarkers that reflect underlying physiological interactions to achieve high predictive accuracy without compromising feasibility.

While ML has been increasingly applied to cancer staging using accessible clinical data, most prior GC models remain constrained by either suboptimal accuracy or limited clinical scalability.^38–43 Logistic regression and random forest approaches, though achieving moderate performance, are restricted by linear assumptions, reliance on isolated predictors, and minimal feature abstraction. More sophisticated frameworks integrating deep learning or multi-omics data have reported higher AUCs, sometimes exceeding 0.92, but these approaches often require transcriptomic profiling, specialized assays, or high-throughput platforms, limiting adoption in routine practice, especially in resource-limited settings. In contrast, the present model leverages only standard preoperative laboratory tests available across healthcare systems. The inclusion of biologically informed ratio features, such as RDW/HCT, AFP/CEA, and ChE/LDH, enables the capture of nonlinear interactions among hematological, inflammatory, and tumor-associated processes, aligning data-driven predictions with known pathophysiological mechanisms.

Several of the most influential features identified by SHAP analysis have biologically plausible links to GC progression. Reduced UA and prolonged APTT, alongside elevated RDW/HCT and eosinophil percentage, were consistently associated with advanced-stage disease. UA emerged as a top predictor across both raw and ratio-enhanced models. While hyperuricemia has been implicated in tumor initiation through pro-oxidant and mitogenic pathways, accumulating evidence, including our findings, suggests that low UA levels are linked to advanced GC, potentially reflecting cancer-associated cachexia, antioxidant depletion, or impaired purine metabolism.^44–47 APTT, a measure of intrinsic coagulation activity, showed a positive association with advanced disease, consistent with prior studies linking its prolongation to tumor-associated coagulopathy, endothelial dysfunction, and hepatic impairment.^48,49 The RDW/HCT ratio, by integrating red cell size variability with erythrocyte volume, provides a normalized index of erythropoietic efficiency and mitigates confounding from hydration status, transfusion, or oxidative stress.^50,51 Its predictive value has been demonstrated across sepsis, hemorrhagic stroke, and various malignancies,^52–54 and in GC, both RDW and HCT have ranked among top predictors in ML models.⁵⁵ Collectively, these biomarkers delineate a coherent axis of coagulative, metabolic, and hematopoietic imbalance, hallmarks of tumor progression. Furthermore, Liveraro et al. combined clinical variables with automated deep-learning segmentation of skeletal-muscle and adiposity on CT to predict postoperative prognosis in resectable GC, with muscle radiodensity features among the top contributors.⁵⁶ These orthogonal data domains (laboratory and body composition imaging) could be integrated in future work to further enhance risk stratification.

Accurate preoperative differentiation between stage I and stages II–III GC is pivotal for optimizing therapeutic strategies, avoiding overtreatment in early-stage patients while ensuring timely neoadjuvant therapy for advanced disease. The proposed framework integrates conventional blood biomarkers with biologically meaningful ratio features, substantially improving discrimination over traditional variables. Comprehensive benchmarking across 11 mainstream algorithms identified CatBoost as the most performant classifier. Its interpretability, enhanced through SHAP-based global and patient-level explanations, provides mechanistically plausible reasoning for each prediction, facilitating clinician trust, and adoption. The simplicity, accessibility, and transparency of this model support its potential use as a pre-endoscopic triage tool, an adjunct to imaging-based assessment, or a scalable solution for resource-limited settings.

Several limitations should be acknowledged. This was a retrospective, single-center study with a moderate sample size, which may limit generalizability and underscores the need for prospective validation. Patients receiving neoadjuvant therapy were excluded, reducing confounding but restricting applicability to such populations. As a national referral center, our cohort included more early-stage patients than typically seen in routine practice, which may introduce case-mix bias. The absence of long-term follow-up and external validation also limits assessment of prognostic value. Furthermore, MMR status was determined solely by postoperative IHC for MLH1, MSH2, MSH6, and PMS2, without germline or MLH1 promoter methylation testing; hence Lynch syndrome could not be distinguished from sporadic dMMR.^57,58 Importantly, this limitation does not affect our main findings, as MMR status was reported only as a baseline characteristic and was not used as a feature in model construction. Finally, reliance on blood-based biomarkers alone may overlook complementary information from imaging modalities, which could further enhance staging models in future studies.

Conclusion

In summary, our interpretable CatBoost model demonstrated excellent predictive capability in differentiating stage I from stages II–III GC patients. SHAP analysis provided valuable explanations at both global and individual levels, enhancing clinical transparency, and personalized treatment planning. This study underscores the potential of ML models integrated with accessible blood biomarkers as noninvasive, cost-effective, and practical tools for early-stage GC screening and individualized patient management.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251388394 - Supplemental material for Biomarker-based and interpretable machine learning framework for predicting pathological stage in gastric cancer: A retrospective analysis

Supplemental material, sj-docx-1-dhj-10.1177_20552076251388394 for Biomarker-based and interpretable machine learning framework for predicting pathological stage in gastric cancer: A retrospective analysis by Guanmo Liu, Sen Yang, Jie Li, Zicheng Zheng, Chenggang Zhang, Yixuan He, Yihua Wang, Weiming Kang and Xin Ye in DIGITAL HEALTH

Supplemental Material

sj-docx-2-dhj-10.1177_20552076251388394 - Supplemental material for Biomarker-based and interpretable machine learning framework for predicting pathological stage in gastric cancer: A retrospective analysis

Supplemental material, sj-docx-2-dhj-10.1177_20552076251388394 for Biomarker-based and interpretable machine learning framework for predicting pathological stage in gastric cancer: A retrospective analysis by Guanmo Liu, Sen Yang, Jie Li, Zicheng Zheng, Chenggang Zhang, Yixuan He, Yihua Wang, Weiming Kang and Xin Ye in DIGITAL HEALTH

Supplemental Material

sj-docx-3-dhj-10.1177_20552076251388394 - Supplemental material for Biomarker-based and interpretable machine learning framework for predicting pathological stage in gastric cancer: A retrospective analysis

Supplemental material, sj-docx-3-dhj-10.1177_20552076251388394 for Biomarker-based and interpretable machine learning framework for predicting pathological stage in gastric cancer: A retrospective analysis by Guanmo Liu, Sen Yang, Jie Li, Zicheng Zheng, Chenggang Zhang, Yixuan He, Yihua Wang, Weiming Kang and Xin Ye in DIGITAL HEALTH

Footnotes

Abbreviations

ORCID iDs

Sen Yang

Jie Li

Yixuan He

Yihua Wang

Weiming Kang

Xin Ye

Ethical approval

This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of PUMCH (approval no. I-23PJ2155).

Contributorship

GL and SY conceptualized the idea for this study. JL, ZZ, and CZ performed the data analysis. YH and YW drafted and critically revised the work. WK and XY supervised and revised the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the CAMS Innovation Fund for Medical Sciences (2023-I2M-C&T-B-016), Beijing Natural Science Foundation (7232117) and Fundamental Research Funds for the Central Universities, Peking Union Medical College (3332025013 and 3332025120).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Availability of data and materials

Data supporting the findings of this study are available from the corresponding author upon reasonable request.

Informed consent

Written informed consent was obtained from all participants prior to inclusion.

Gurantor

Weiming Kang is the guarantor of this study and accepts full responsibility for its integrity and accuracy. He ensures that all aspects of the research have been conducted in accordance with ethical guidelines and journal requirements. He has overseen the research process, confirms that all authors meet the authorship criteria, and will address any inquiries regarding the study.

Peer review

None.

Supplemental material

Supplemental material for this article is available online.

References

Bray

Laversanne

Sung

, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024; 74: 229–263.

Huang

Shao

, et al. Global progress and future prospects of early gastric cancer screening. J Cancer 2024; 15: 3045–3064.

Zheng

Chen

Shen

, et al. Prognostic factors in stage I gastric cancer: a retrospective analysis. Open Med (Wars) 2020; 15: 754–762.

de Jesus

VHF

da Costa

Jr Felismino

, et al. Survival outcomes of patients with pathological stage I gastric cancer using the competing risks survival method. J Gastrointest Oncol 2019; 10: 1110–1119.

Huang

Sun

, et al. Effect of laparoscopic vs open distal gastrectomy on 3-year disease-free survival in patients with locally advanced gastric cancer: the CLASS-01 randomized clinical trial. JAMA 2019; 321: 1983–1992.

Zheng

Wang

, et al. A novel TNM staging system for gastric cancer based on the metro-ticket paradigm: a comparative study with the AJCC-TNM staging system. Gastric Cancer 2019; 22: 759–768.

Kim

Kang

Choi

, et al. Korean practice guidelines for gastric cancer 2024: an evidence-based, multidisciplinary approach (update of 2022 guideline). J Gastric Cancer 2025; 25: 5–114.

Japanese Gastric Cancer A . Japanese gastric cancer treatment guidelines 2021 (6th edition). Gastric Cancer 2023; 26: 1–25.

Kim

Park

Ryu

, et al. Phase 3 trial of postoperative chemotherapy alone versus chemoradiation therapy in stage III-IV gastric cancer treated with R0 gastrectomy and D2 lymph node dissection. Int J Radiat Oncol Biol Phys 2012; 84: e585–e592.

10.

Kurimoto

Ishigure

Mochizuki

, et al. A feasibility study of postoperative chemotherapy with S-1 and cisplatin (CDDP) for stage III/IV gastric cancer (CCOG 1106). Gastric Cancer 2015; 18: 354–359.

11.

Sasako

Sakuramoto

Katai

, et al. Five-year outcomes of a randomized phase III trial comparing adjuvant chemotherapy with S-1 versus surgery alone in stage II or III gastric cancer. J Clin Oncol 2011; 29: 4387–4393.

12.

Nakanishi

Kanda

Ito

, et al. Delay in initiation of postoperative adjuvant chemotherapy with S-1 monotherapy and prognosis for gastric cancer patients: analysis of a multi-institutional dataset. Gastric Cancer 2019; 22: 1215–1225.

13.

Chen

Xiao

Zhang

, et al. Association between adjuvant chemotherapy and survival in stage I gastric cancer patients after curative resection. Gastroenterol Rep (Oxf) 2023; 11: goad070.

14.

Yun

Song

Son

, et al. Global leadership initiative on malnutrition criteria and immunonutritional status predict chemoadherence and survival in stage II/III gastric cancer treated with XELOX chemotherapy. Nutrients 2024; 16: 3468.

15.

Rivera

Galan

Tabernero

, et al. Phase II trial of preoperative irinotecan-cisplatin followed by concurrent irinotecan-cisplatin and radiotherapy for resectable locally advanced gastric and esophagogastric junction adenocarcinoma. Int J Radiat Oncol Biol Phys 2009; 75: 1430–1436.

16.

Wang

Ren

, et al. Docetaxel, oxaliplatin, leucovorin, and 5-fluorouracil (FLOT) as preoperative and postoperative chemotherapy compared with surgery followed by chemotherapy for patients with locally advanced gastric cancer: a propensity score-based analysis. Cancer Manag Res 2019; 11: 3009–3020.

17.

Ychou

Boige

Pignon

, et al. Perioperative chemotherapy compared with surgery alone for resectable gastroesophageal adenocarcinoma: an FNCLCC and FFCD multicenter phase III trial. J Clin Oncol 2011; 29: 1715–1721.

18.

Zhang

Yang

, et al. Liquid biopsy: circulating tumor DNA monitors neoadjuvant chemotherapy response and prognosis in stage II/III gastric cancer. Mol Oncol 2023; 17: 1930–1942.

19.

Zhang

Hou

, et al. Neoadjuvant PD-1 blockade plus chemotherapy versus chemotherapy alone in locally advanced stage II-III gastric cancer: a single-centre retrospective study. Transl Oncol 2023; 31: 101657.

20.

Sun

, et al. MRI versus dual-energy CT in local-regional staging of gastric cancer. Radiology 2024; 312: e232387.

21.

Klingbeil

Eng

Dube

, et al. CT imaging as a single modality for clinical staging of gastric cancer in limited resource centers: a retrospective pilot study. J Surg Oncol 2024; 130: 1551–1562.

22.

Bian

Wang

, et al. Influence of visceral adipose tissue on the accuracy of tumor T-staging of gastric cancer in preoperative CT. Jpn J Radiol 2025; 43: 656–665.

23.

Mocellin

Pasquali

. Diagnostic accuracy of endoscopic ultrasonography (EUS) for the preoperative locoregional staging of primary gastric cancer. Cochrane Database Syst Rev 2015; 2015: CD009944.

24.

Xin

Wang

, et al. Prospective comparison of oral contrast-enhanced transabdominal ultrasound imaging with contrast-enhanced computed tomography in pre-operative tumor staging of gastric cancer. Ultrasound Med Biol 2023; 49: 569–577.

25.

Yang

, et al. Comprehensive machine learning-based preoperative blood features predict the prognosis for ovarian cancer. BMC Cancer 2024; 24: 267.

26.

Oliveira

. Biotechnology, big data and artificial intelligence. Biotechnol J 2019; 14: e1800613.

27.

Mirza

Wang

, et al. Machine learning and integrative analysis of biomedical big data. Genes (Basel) 2019; 10: 87.

28.

Mahadik

Sen

Shah

. Harnessing digital health technologies and real-world evidence to enhance clinical research and patient outcomes. Digit Health 2025; 11: 20552076251362097.

29.

Zan

Gao

, et al. A machine learning method for identifying lung cancer based on routine blood indices: qualitative feasibility study. JMIR Med Inform 2019; 7: e13476.

30.

Lin

Xiao

, et al. Colorectal cancer detected by machine learning models using conventional laboratory test data. Technol Cancer Res Treat 2021; 20: 15330338211058352.

31.

Kim

Lim

Kim

, et al. Effectiveness of a personalized digital exercise and nutrition-based rehab program for patients with gastric cancer after surgery: study protocol for a randomized controlled trial. Digit Health 2023; 9: 20552076231187602.

32.

Benhamida

Hechtman

Nafa

, et al. Reliable clinical MLH1 promoter hypermethylation assessment using a high-throughput genome-wide methylation array platform. J Mol Diagn 2020; 22: 368–375.

33.

Park

Nam

Seo

, et al. Comprehensive study of microsatellite instability testing and its comparison with immunohistochemistry in gastric cancers. J Gastric Cancer 2023; 23: 264–274.

34.

Nádorvári

Lotz

Kulka

, et al.

Microsatellite instability and mismatch repair protein deficiency: equal predictive markers?

Pathol Oncol Res 2024; 30: 1611719.

35.

de la Fouchardière

Cammarota

Svrcek

, et al. How do I treat dMMR/MSI gastro-oesophageal adenocarcinoma in 2025? A position paper from the EORTC-GITCG gastro-esophageal task force. Cancer Treat Rev 2025; 134: 102890.

36.

Puliga

Corso

Pietrantonio

, et al. Microsatellite instability in gastric cancer: between lights and shadows. Cancer Treat Rev 2021; 95: 102175.

37.

Zhu

Wang

, et al. Microsatellite instability and survival in gastric cancer: a systematic review and meta-analysis. Mol Clin Oncol 2015; 3: 699–705.

38.

Wei

Wang

Ouyang

, et al. Machine learning for early discrimination between lung cancer and benign nodules using routine clinical and laboratory data. Ann Surg Oncol 2024; 31: 7738–7749.

39.

Abu-Freha

Afawi

Yousef

, et al. A machine learning approach to differentiate stage IV from stage I colorectal cancer. Comput Biol Med 2025; 191: 110179.

40.

Yang

, et al. Explainable machine learning models for early gastric cancer diagnosis. Sci Rep 2024; 14: 17457.

41.

Tan

Feng

Huang

, et al. Development and validation of a radiopathomics model based on CT scans and whole slide images for discriminating between stage I-II and stage III gastric cancer. BMC Cancer 2024; 24: 368.

42.

Chen

Wang

Zhao

, et al. Metabolomic machine learning predictor for diagnosis and prognosis of gastric cancer. Nat Commun 2024; 15: 1657.

43.

Cai

Bian

, et al. Predicting early gastric cancer risk using machine learning: a population-based retrospective study. Digit Health 2024; 10: 20552076241240905.

44.

Kuo

Luo

See

, et al. Increased risk of cancer among gout patients: a nationwide population study. Joint Bone Spine 2012; 79: 375–378.

45.

Wang

, et al. Increased risk of cancer in relation to gout: a review of three prospective cohort studies with 50,358 subjects. Mediators Inflamm 2015; 2015: 680853.

46.

Elstein

Rosenmann

Reinus

, et al. Amyloidosis and gastric bleeding in a patient with Gaucher disease. J Clin Gastroenterol 2003; 37: 234–237.

47.

Bhushan

Sohal

Kowdley

. Primary biliary cholangitis and primary sclerosing cholangitis therapy landscape. Am J Gastroenterol 2025; 120: 151–158.

48.

Zheng

Chen

. The combination of seven preoperative markers for predicting patients with gastric cancer to be either stage IV or non-stage IV. Gastroenterol Res Pract 2018; 2018: 3450981.

49.

Shen

Wei

Tian

, et al. Coagulation indices and fibrinogen degradation products as predictive biomarkers for tumor-node-metastasis staging and metastasis in gastric cancer. World J Gastrointest Oncol 2025; 17: 98725.

50.

Salvagno

Sanchis-Gomar

Picanza

, et al. Red blood cell distribution width: a simple parameter with multiple clinical applications. Crit Rev Clin Lab Sci 2015; 52: 86–105.

51.

Jiang

Zou

Zhao

, et al. Erythrocyte transfusion limits the role of elevated red cell distribution width on predicting cardiac surgery associated acute kidney injury. Cardiol J 2021; 28: 255–261.

52.

Tham

Olson

Wotman

, et al. Evaluation of the prognostic utility of the hemoglobin-to-red cell distribution width ratio in head and neck cancer. Eur Arch Otorhinolaryngol 2018; 275: 2869–2878.

53.

Wang

Chen

Yang

, et al. Relationship between the hemoglobin-to-red cell distribution width ratio and all-cause mortality in septic patients with atrial fibrillation: based on propensity score matching method. J Cardiovasc Dev Dis 2022; 9: 400.

54.

Liu

Wang

. Association between hemoglobin-to-red blood cell distribution width ratio and hospital mortality in patients with non-traumatic subarachnoid hemorrhage. Front Neurol 2023; 14: 1180912.

55.

Yang

Cheng

, et al. GC discrimination: identification of gastric cancer based on a milliliter of blood. Brief Bioinform 2021; 22: 536–544.

56.

Liveraro

GSS

Takahashi

MES

Lascala

, et al. Improving resectable gastric cancer prognosis prediction: a machine learning analysis combining clinical features and body composition radiomics. Inform Med Unlocked 2025; 52: 101608.

57.

Wang

Zhang

Vakiani

, et al.

Detecting mismatch repair deficiency in solid neoplasms: immunohistochemistry, microsatellite instability, or both?

Mod Pathol 2022; 35: 1515–1528.

58.

Cancer Genome Atlas Research Network . Comprehensive molecular characterization of gastric adenocarcinoma. Nature 2014; 513: 202–209.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB