Abstract
Objective
Critically ill patients with digestive system tumors are at high risk of acute kidney injury (AKI), a complication strongly associated with increased mortality and adverse outcomes. However, early AKI risk identification in the intensive care unit (ICU) remains challenging. This study aimed to develop and externally validate an interpretable model for early AKI risk prediction in critically ill patients with digestive system tumors.
Methods
We retrospectively analyzed 3,821 patients with digestive system tumors from the MIMIC-IV 3.0 database. Routine clinical variables from the first 24 hours after ICU admission were extracted. Least absolute shrinkage and selection operator (LASSO) regression and multivariable logistic regression were used for feature selection, followed by the development and comparison of six machine learning models. Model performance was evaluated using the AUC, calibration analysis, and decision curve analysis. The optimal model was interpreted using Shapley Additive Explanations (SHAP), and an online interactive prediction tool was developed to facilitate clinical translation. External validation was conducted in three independent cohorts from the United States and China, including the eICU Collaborative Research Database (eICU), the Tangshan tumor-related AKI cohort (TS-TAKI), and the Beijing Acute Kidney Injury cohort (BAKIT).
Result
The incidence of AKI was 75.8%. Among all models, extreme gradient boosting (XGBoost) showed the best overall performance, with an AUC of 0.765 (95% CI 0.745–0.786) in the training set and 0.742 (95% CI 0.710–0.773) in the validation set, low Brier score, good calibration, and meaningful clinical net benefit. SHAP analysis identified the SOFA score, mechanical ventilation, WBC count, age, serum potassium level, sepsis, and vasoactive drug use as key predictors, with consistent interpretability at both the population and individual levels. In external validation, the model demonstrated stable discrimination across multicenter populations (AUCs of 0.719 in eICU, 0.769 in TS-TAKI, and 0.616 in BAKIT); however, calibration performance was notably affected by population heterogeneity, suggesting the need for recalibration using local data in real-world application.
Conclusion
This interpretable XGBoost-based model, based on routine ICU data, enables early AKI risk stratification in critically ill patients with digestive system tumors. The model achieves a balance between predictive performance, transparency, and clinical feasibility. It exhibited stable discriminative performance across multicenter and cross-population cohorts, while calibration was clearly affected by population heterogeneity, highlighting the need for local validation and recalibration in real-world application. An online prediction tool further supports its potential for clinical translation.
Keywords
Background
Tumors of the digestive system rank among the top in terms of incidence and mortality rates of malignant tumors worldwide, having become a major public health problem. 1 Critically ill patients with these malignancies are particularly vulnerable to acute kidney injury (AKI) due to the convergence of tumor-related physiological stress, treatment-associated nephrotoxicity, systemic inflammation, and hemodynamic instability.2–4 Once AKI develops, it is associated with markedly increased in-hospital mortality, interruption or modification of anticancer therapy, and worse long-term outcomes. 5 Notably, a nationwide survey in China identified digestive system malignancies as the most common primary cancers among patients with malignancy-associated AKI, accounting for more than half of reported cases, underscoring the clinical importance of early risk stratification in this high-risk population. 6
Current AKI diagnosis and risk assessment rely predominantly on the Kidney Disease: Improving Global Outcomes (KDIGO) criteria, which are based on elevations in serum creatinine (SCr) or reductions in urine output. 7 However, these criteria have inherent limitations in early risk prediction. Biomarker changes often lag behind the onset of kidney injury, and in patients with cancer, SCr and urine output may be influenced by baseline renal function, fluid status, and oncologic treatments, thereby obscuring early renal injury.
In recent years, machine learning (ML) approaches have shown considerable promise for the early prediction of AKI.8,9 However, most existing models have been developed in mixed ICU populations, with limited focus on specific subgroups. As a result, these models may not adequately capture the unique risk profiles and pathophysiological characteristics of critically ill patients with digestive system tumors.
Moreover, many ML models remain “black boxes,” offering limited interpretability of how predictors contribute to risk estimation. This lack of transparency hampers clinical trust and adoption. 10 Improving model interpretability and enabling real-time application in clinical settings therefore remain critical challenges. In addition, few studies have conducted systematic external validation across multicenter and cross-population cohorts, leaving the generalizability of existing models uncertain11 and 12. Robust external validation is essential to ensure model stability and reliability in real-world practice.
To address these limitations, this study incorporates the Shapley Additive Explanations (SHAP) framework to enhance model interpretability. An online interactive tool was further developed to enable real-time AKI risk estimation based on routinely available clinical variables, thereby supporting early risk identification and clinical decision-making. External validation was performed across multiple independent cohorts, aiming to develop a population-specific, interpretable, and clinically applicable prediction model for AKI risk stratification in critically ill patients with digestive system tumors.
Methods
Data source and ethics statement
This retrospective study was conducted in accordance with the Declaration of Helsinki and was reported following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. The model development cohort was derived from the Medical Information Mart for Intensive Care IV (MIMIC-IV 3.0) database, which contains detailed clinical data of critically ill patients in the United States. 13 All data in MIMIC-IV are fully de-identified. Access to the database was granted after completion of the required ethics training and certification (certification ID: 59209829).
External validation was performed using three independent cohorts: the multicenter eICU Collaborative Research Database (eICU-CRD), a single-center Chinese Tangshan Cohort Database for Tumor-Associated Acute Kidney Injury (TS-TAKI), and the multicenter Beijing AKI Clinical Trial (BAKIT) database. The eICU-CRD is a publicly available database derived from multiple intensive care units across the United States and includes detailed, high-resolution clinical data from more than 200,000 ICU admissions. The TS-TAKI database is a single-center database derived from a cohort study on tumor-associated AKI that was registered at the Chinese Clinical Trial Registry (ChiCTR2500103958, registered on 9 June 2025) and conducted at Tangshan People’s Hospital. The BAKIT database is a multicenter prospective cohort focusing on the epidemiology of AKI among ICU patients in Beijing, China, and includes comprehensive clinical data collected from 30 ICUs across 28 tertiary hospitals. Data collection and usage strictly complied with ethical standards. The eICU-CRD consists of fully de-identified data and is exempt from informed consent. The TS-TAKI cohort was approved by the institutional ethics committee of Tangshan People’s Hospital (approval No. RMYY-LLKS-2024039). Use of the BAKIT database was approved by the Ethics Committee of Beijing Fuxing Hospital, Capital Medical University (approval No. 2010FXHEC-KF026), with a waiver of informed consent.
Inclusion and exclusion criteria
Patients were screened from the MIMIC-IV database to construct the study cohort. The inclusion criteria were as follows: (1) age ≥ 18 years and ≤ 89 years; (2) patients diagnosed with digestive system tumors, identified according to ICD-9/10 codes; and (3) complete records of renal function monitoring and key clinical variables after ICU admission. The exclusion criteria were: (1) repeated ICU admissions (only the first ICU admission record was retained); (2) ICU length of stay < 24 hours; (3) patients already diagnosed with AKI before or at the time of ICU admission; (4) received renal replacement therapy (RRT) prior to or at ICU admission and (5) missing values in more than 20% of secondary clinical variables. The external validation cohort applied the same inclusion and exclusion criteria as those used in the model development phase. The specific study workflow is shown in Figure 1. Flow chart of participant selection and exclusion.
Data collection
Digestive system tumors were identified according to International Classification of Diseases, Ninth and Tenth Revision (ICD-9/10) codes, including neoplastics of the esophagus, stomach, colorectum, liver, biliary tract, pancreas, and other digestive organs. Tumors were not further stratified by pathological behavior; benign, malignant, and unspecified neoplasms were all included to reflect real-world clinical practice in the intensive care setting. The prediction starting point was defined as ICU admission. The predictor variables were extracted from the first recorded measurements within the first 24 hours of ICU admission to reflect the patient’s early physiological status. The primary outcome was the new onset of AKI within 72 hours of ICU admission. To ensure temporal consistency in predictions, patients who met the KDIGO diagnostic criteria for AKI prior to or at the time of ICU admission were excluded from the analysis.
AKI diagnosis was based on the KDIGO clinical practice guidelines: a SCr increase of ≥26.5 μmol/L within 48 hours, or an increase to ≥1.5 times the baseline value, or urine output <0.5 mL·kg-1·h-1 for ≥6 hours. The baseline SCr was defined as the most recent measurement available within 7 days prior to ICU admission; if no record was available within this timeframe, the lowest SCr value within the first 24 hours after admission was used as the substitute baseline. Patients without any available SCr measurement within either timeframe were excluded because AKI status could not be determined. The hourly urine output records were extracted from the database; when hourly data were incomplete, a 6-hour rolling window was applied to aggregate urine volume, which was then divided by the corresponding duration and patient body weight to derive the hourly rate. Body weight was obtained from admission records, and if unavailable, was imputed using the median weight for the corresponding sex and age group within the cohort.
Baseline characteristics of patients with digestive system tumors.
#represents the Z value.
Abbreviations: AKI, acute kidney injury; SOFA, Sequential Organ Failure Assessment. The SOFA score was calculated within the first 24 hours after ICU admission; ACEI, angiotensin-converting enzyme inhibitor; ARB, angiotensin receptor blocker; WBC, white blood cell; RBC, red blood cell.
Statistical method
All statistical analyses were performed using SPSS software (version 26.0) and R software (version 4.4.2). Continuous variables with a normal distribution are presented as mean ± standard deviation and were compared using the t test, whereas non-normally distributed variables are expressed as median (interquartile range) and were compared using the Mann–Whitney U test. Categorical variables are presented as number (percentage) and were compared using the chi-square test or Fisher’s exact test, as appropriate. A two-sided P value < 0.05 was considered statistically significant.
Model development development, evaluation, and external validation
During model development, candidate clinical variables were preprocessed, including standardization of variable types, outlier detection, and handling of missing data. Continuous variables were winsorized at the 1st and 99th percentiles to attenuate extreme outliers, and all continuous predictors were standardized using z-score normalization prior to model training. Variables with ≤20% missingness were imputed using multiple imputation by chained equations (MICE) with five imputations, and estimates were pooled using Rubin’s rules. Variables with >20% missingness or limited clinical relevance were excluded.14,15
The dataset was randomly split into training and testing sets in a 7:3 ratio using stratified sampling. Candidate predictors were first subjected to feature selection using least absolute shrinkage and selection operator (LASSO) regression, with the optimal penalty parameter determined by 10-fold cross-validation under the one-standard-error rule. Selected variables were then entered into a multivariable logistic regression model to estimate odds ratios (ORs) with 95% confidence intervals (CIs), while variance inflation factors (VIFs) were calculated to exclude significant multicollinearity (VIF < 5).
Based on the selected features, six models were developed, including decision tree (DT), k-nearest neighbors (KNN), Light Gradient Boosting Machine (LightGBM), logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost). All models were trained on the training set and tuned using 5-fold cross-validation, with the area under the receiver operating characteristic curve (AUC) as the primary metric, supplemented by accuracy and specificity. Hyperparameters were optimized within predefined ranges (e.g., XGBoost: depth 3–8, learning rate 0.0001–0.1; LightGBM: number of trees 100–500; KNN: neighbors 3–11).
Model performance was evaluated on the independent testing set, and the best-performing model was selected based on overall discrimination and stability. SHAP were subsequently applied to enhance model interpretability, and an online decision-support tool was developed to facilitate clinical application.
External validation was conducted using the multicenter eICU database(predominantly Western populations), the single-center TS-TAKI cohort, and the multicenter BAKIT database (predominantly Chinese populations) to assess generalizability across heterogeneous populations. Model performance was evaluated in terms of discrimination, calibration, and clinical utility using receiver operating characteristic (ROC) curves, calibration plots, and decision curve analysis (DCA).
Results
Baseline feature comparison
A total of 3,821 critically ill patients with digestive system tumors were included in this study, of whom 2,898 (75.8%) developed AKI within 72 hours after ICU admission. The cohort was randomly divided into a training set (70%, n=2,674) and an internal validation set (30%, n=1,147).
Baseline characteristics of the two cohorts are presented in Table 1. Overall, demographic characteristics, admission characteristics, comorbidities, treatments and interventions, vital signs, and most laboratory measurements were comparable between the training and validation sets. Although small statistical differences were observed in platelet count and immunosuppressant use, the absolute differences were modest and were not considered clinically meaningful. These findings indicate that the random allocation resulted in two well-balanced cohorts suitable for subsequent model development and validation.
Prediction model variable selection
LASSO regression identified seven predictors with non-zero coefficients at the optimal penalty parameter (λ=0.026) (Figure 2). The missingness of these variables were reviewed during the data preprocessing stage, with rates of 0% for SOFA score, mechanical ventilation, age, sepsis, and vasoactive drug use, and 2.1% and 1.8% for WBC count and serum potassium, respectively. All variables had missingness below 5%, reflecting high data completeness attributable to the use of routinely collected variables within the first 24 hours of ICU admission. LASSO regression for feature selection. (A) Ten-fold cross-validation plot of binomial deviance as a function of log(λ). The vertical dashed lines indicate the optimal λ value based on the minimum criteria (left) and the 1-standard-error rule (right), respectively. (B) Coefficient profiles of the candidate variables as a function of log(λ). At the optimal λ determined by the 1-SE rule, 7 variables with nonzero coefficients were retained.
Multivariable logistic regression analysis for predictors of AKI.
B represents the regression coefficient; SE, standard error; OR, odds ratio; CI, confidence interval. SOFA, Sequential Organ Failure Assessment; WBC, white blood cell; RBC, red blood cell.
Model performance comparison
Performance comparison of prediction models in the test cohort.
DT, decision tree; KNN, k-nearest neighbor; LightGBM, light gradient boosting machine; LR, logistic regression; RF, random forest; XGBoost, extreme gradient boosting.

Performance comparison of six prediction models for AKI in critically ill patients with digestive system tumors. (A) ROC curves. (B) Calibration curves comparing predicted and observed AKI probabilities. (C) DCA in the training set. (D) DCA in the testing set. DT, decision tree; RF, random forest; XGBoost, extreme gradient boosting; LR, logistic regression; LightGBM, light gradient boosting machine; KNN, k-nearest neighbor.
Calibration analysis demonstrated good agreement between predicted and observed probabilities, with XGBoost achieving the lowest Brier score (0.1586), indicating optimal calibration performance (Figure 3(b)).
DCA further showed that XGBoost provided greater net benefit than the other models across a wide range of clinically relevant threshold probabilities, particularly between 10% and 70% (Figure 3(c)、D).
Considering its discrimination, calibration, and clinical net benefit, XGBoost was selected as the optimal model for subsequent explainability analysis and external validation.
Model interpretation
To enhance interpretability, SHAP were applied to the optimal XGBoost model. The global SHAP summary plot (Figure 4(a)) ranked predictors by mean absolute SHAP values, identifying SOFA score, mechanical ventilation, WBC count, age, serum potassium, sepsis, and vasoactive drug use as the most influential factors. Higher SOFA score, older age, elevated WBC count, increased serum potassium, and the presence of mechanical ventilation, sepsis, or vasoactive drug use were associated with increased predicted AKI risk, whereas lower values showed negative contributions. These patterns are consistent with established AKI pathophysiology, supporting the biological plausibility of the model and its potential clinical applicability. SHAP-based interpretation of the XGBoost model. (A) Global SHAP summary (beeswarm) plot showing the relative importance and direction of effects of the selected predictors, ranked by mean absolute SHAP values. (B) SHAP force plot illustrating the contribution of individual predictors to the predicted AKI risk for a representative patient.
At the individual level, a representative case illustrated by a SHAP force plot (Figure 4(b)) showed that mechanical ventilation and elevated WBC count were the main contributors to increased AKI risk, while lower SOFA score, younger age, absence of sepsis, lower serum potassium, and no vasoactive drug use exerted protective effects. This additive decomposition translates complex model outputs into intuitive, patient-specific explanations, thereby improving clinical interpretability. 16
Model visualization and clinical implementation
To facilitate clinical translation, an online interactive prediction tool was developed using the R Shiny framework (https://digestive-aki-model.shinyapps.io/digestive-aki-app1/; Figure 5). Clinicians can input patient-specific variables to obtain real-time predicted probabilities of AKI, accompanied by SHAP summary and force plots for visualized and interpretable outputs, thereby supporting risk assessment and clinical decision-making. Online aki risk prediction tool for critically ill patients with digestive system.
Tumors
External validation
External validation was performed using three independent cohorts—eICU (n=1,670), TS-TAKI (n=352), and BAKIT (n=227)—following the same inclusion and exclusion criteria as in the development phase. Standardized mean differences (SMDs) were used to assess baseline balance between the development and validation cohorts(Supplementary Table 2). Substantial heterogeneity was observed across cohorts, with significant differences in key predictors (all P < 0.001). The eICU cohort represented a lower-risk population with reduced rates of mechanical ventilation, sepsis, and SOFA scores, and a lower AKI incidence (17.7%). The TS-TAKI cohort was characterized by high mechanical ventilation but low sepsis rates and lower WBC levels, with an AKI incidence of 15.3%, reflecting differences in disease spectrum among Chinese oncology ICU patients and variability in sepsis diagnostic criteria across centers. The BAKIT cohort showed moderate similarity in some variables but remained imbalanced in sepsis and SOFA score, with an AKI incidence of 38.8%. Notably, substantial variation in sepsis prevalence and AKI incidence across cohorts (SMD ≥ 0.50; range 15.3%–75.8%) indicated marked case-mix differences.
In terms of discrimination (Figure 6(a)), the XGBoost-AKI model achieved AUCs of 0.765 (95% CI 0.745–0.786) in the training set and 0.742 (0.710–0.773) in the internal testing set. In external validation, performance was highest in the TS-TAKI cohort (AUC=0.769; 0.701–0.833), followed by eICU (AUC=0.719; 0.684–0.752), and lowest in BAKIT (AUC=0.616; 0.539–0.686). These findings suggest that model discrimination was strongly influenced by differences in disease severity and outcome prevalence; when such differences are substantial, model performance may decline even if individual predictors show small SMDs. External validation of the XGBoost model across independent cohorts. (A) ROC curves with 95% confidence intervals for the training set, internal test set, and three external validation cohorts (eICU, TS-TAKI, and BAKIT). (B) Calibration curves comparing predicted and observed probabilities of acute kidney injury in the internal test set and external validation cohorts. (C) Decision curve analysis showing the net clinical benefit of the model compared with treat-all and treat-none strategies across a range of threshold probabilities in each dataset.
Calibration analysis (Figure 6(b)) showed poor agreement in all external cohorts (Hosmer–Lemeshow test, all P < 0.001, Supplementary Table 3). The model tended to overestimate AKI risk in the lower-incidence eICU and TS-TAKI cohorts. In the BAKIT cohort, calibration curves were unstable due to internal variability in predictor distributions, highlighting the central role of systematic shifts in baseline risk and predictor distributions in driving miscalibration.
Decision curve analysis (Figure 6(c)) revealed an overall reduction in the model’s net benefit across the external validation cohorts. While a relative advantage persisted at low threshold probabilities, this benefit progressively attenuated with increasing thresholds. This pattern was primarily attributable to heterogeneity in baseline AKI incidence among cohorts. In high-baseline-risk populations, the treat-all strategy itself already conferred a substantial net benefit, thereby limiting the incremental gain achievable by the model. In low baseline risk cohorts (such as eICU and TS-TAKI), while the model can optimize risk stratification, its clinical net benefit was constrained by miscalibration.
Overall, the XGBoost-AKI model demonstrated favorable discrimination across multicenter external validation cohorts; however, its calibration performance and clinical net benefit varied significantly across heterogeneous populations. Baseline characteristic imbalance, particularly distributional shifts in AKI incidence and disease severity indicators, was the principal driver of these observed discrepancies. These findings underscore the importance of assessing population applicability using SMDs and accounting for heterogeneity in cross-center implementation. Model recalibration may be necessary to improve risk estimation and enhance its real-world clinical utility.
Discussion
AKI remains a frequent and serious complication in critically ill patients, 17 with a particularly high burden among those with digestive system tumors. In this population, overlapping renal insults arising from malignancy-related metabolic stress, intensive care interventions, and early multi-organ dysfunction substantially increase the risk of AKI. In the present study, we developed and externally validated an interpretable XGBoost-based model for early AKI risk stratification in this high-risk population.
A major strength of this study lies in its explicit clinical positioning and methodological specificity. Rather than modeling tumor biology, tumor stage, or treatment-specific characteristics, our approach focused on assessing early renal vulnerability at ICU admission using routinely available clinical variables. This design addresses a critical unmet need in real-world critical care settings, where detailed oncologic information is often unavailable or incomplete during the early phase of ICU admission, whereas timely risk assessment remains essential for guiding immediate clinical decisions. Furthermore, most existing AKI prediction models have been developed in mixed ICU populations, with limited focus on specific subgroups such as patients with digestive system tumors. By concentrating on this high-risk population and conducting systematic external validation across multicenter and cross-population cohorts, this study advances beyond generic ICU risk models and provides a population-specific, interpretable, and clinically applicable tool for onco-nephrology risk stratification.
Among six candidate algorithms, the XGBoost model achieved the most favorable balance between discrimination, calibration, and clinical utility. It is important to contextualize these performance metrics within the framework of class imbalance. The AKI incidence of 75.8% in the development cohort reflects the high baseline renal vulnerability of this specific population, which inherently constrains the utility of overall accuracy as a primary performance indicator. A naive classifier predicting AKI for all patients would achieve 75.8% accuracy yet would provide no clinically actionable discrimination and would misclassify all non-AKI patients. In contrast, our XGBoost model achieved an AUC of 0.742 with meaningful specificity (70.4%) and a high F1 score (0.740), indicating robust performance in distinguishing high-risk from low-risk individuals. When benchmarked against recent machine learning models for AKI prediction in ICU subpopulations, our discriminative performance appears modest relative to those developed in respiratory failure cohorts (AUC 0.902) 18 or persistent sepsis-associated AKI (AUC 0.870–0.932), 19 which often leverage more dynamic physiological parameters or disease-specific biomarkers. However, it aligns closely with performance reported in other cardiac-specific ICU cohorts (AUC 0.765), 20 suggesting that early prediction in narrowly defined, high-risk oncologic populations presents distinct methodological challenges. Notably, unlike single-database studies with limited external validation, 21 our model was validated across three independent cohorts spanning different countries and healthcare systems, including a prospectively collected Chinese oncology ICU cohort (TS-TAKI). This rigorous validation framework, combined with SHAP-based interpretability and deployment as an online clinical decision-support tool, prioritizes clinical feasibility and generalizability rather than maximal discriminative metrics alone. Moreover, the model demonstrated favorable calibration (Brier score 0.1586) and positive net benefit on decision curve analysis across clinically relevant threshold probabilities. These complementary metrics collectively indicate that the model offers genuine clinical utility beyond prevalence-driven guessing, supporting its value as an early warning decision-support tool.
The seven selected predictors—SOFA score, mechanical ventilation, white blood cell count, age, serum potassium, sepsis, and vasoactive drug use—were all available within the first 24 hours of ICU admission, prioritizing feasibility and early usability for real-world ICU workflows. While these variables are not oncology-specific per se, they capture the cumulative effects of malignancy-related stress, disease severity, and early organ dysfunction on renal susceptibility in this population. SHAP analysis further clarified the clinical relevance of these predictors. The SOFA score emerged as the most influential contributor, underscoring the central role of multi-organ dysfunction in the development of AKI.22,23 Mechanical ventilation likely reflects the combined effects of hypoxemia, 24 positive pressure–related hemodynamic changes, and overall illness severity. 25 Elevated WBC counts highlight the contribution of systemic inflammation, 26 whereas increasing age reflects reduced renal reserve and increased susceptibility to injury. 8 Elevated serum potassium levels may indicate early renal impairment or severe metabolic disturbances. 27 Sepsis and the use of vasoactive drugs showed clear positive contributions, consistent with infection-driven inflammatory cascades and circulatory instability.28,29 The direction and magnitude of these effects align closely with established pathophysiological mechanisms of AKI, supporting the biological plausibility of the model and reinforcing the clinical interpretability afforded by SHAP. 30
Model generalizability was systematically evaluated through external validation in three independent cohorts spanning different countries, healthcare systems, and population structures. Acceptable discrimination was maintained in the multicenter eICU (AUC=0.719) and TS-TAKI (AUC=0.769) cohorts, supporting the model’s applicability across diverse clinical settings. Notably, the model’s superior performance in the TS-TAKI cohort, which specifically comprised tumor-associated AKI patients, supports its specific applicability to oncology populations rather than functioning merely as a generic ICU AKI predictor. In contrast, model performance declined in the BAKIT cohort, with reduced discrimination (AUC=0.616) and net clinical benefit. This heterogeneity likely reflects differences in patient case mix, baseline AKI incidence, illness severity, ICU admission criteria, and clinical practice patterns across cohorts. The eICU cohort represented a lower-risk Western population with reduced rates of mechanical ventilation, sepsis, and lower SOFA scores, whereas the BAKIT cohort, a prospective multicenter Chinese cohort, showed moderate similarity in some variables but remained imbalanced in sepsis prevalence and SOFA score distribution. Substantial variation in baseline risk and outcome prevalence across cohorts (AKI incidence ranging from 15.3% to 75.8%) indicated marked case-mix differences. Importantly, such heterogeneity does not undermine the model’s overall value but rather highlights the necessity of local validation and recalibration prior to clinical implementation. 31 Systematic shifts in baseline risk and predictor distributions can drive miscalibration even when individual predictors appear broadly similar, emphasizing that external validation should assess not only discrimination but also calibration across diverse settings.
Beyond predictive accuracy, interpretability is essential for clinical translation. By integrating SHAP, complex machine learning outputs were transformed into intuitive, patient-level explanations. Global SHAP analyses clarified the relative importance of predictors, while individual-level visualizations illustrated how specific variables contributed to predicted risk. This approach facilitates clinician understanding, enhances trust in model outputs, and supports patient-level risk communication. 32 To facilitate clinical use, the XGBoost-AKI model was deployed as an online prediction tool capable of generating real-time AKI risk estimates and SHAP-based explanations using routinely available ICU variables. This tool is intended as a clinical decision support aid rather than a replacement for clinical judgment. Its primary value lies in improving early risk recognition and enabling more targeted management—for example, prompting enhanced hemodynamic monitoring, guiding nephrotoxin avoidance, facilitating timely fluid stewardship, and triggering early nephrology consultation for high-risk patients. Such integration into ICU workflows could support a proactive rather than reactive kidney care paradigm, potentially mitigating the progression from early renal stress to overt AKI.
Limitations
Several limitations should be acknowledged. First, this study was primarily based on retrospective databases, and selection bias and residual confounding cannot be fully excluded despite multicenter external validation. Prospective studies are therefore required to evaluate real-world impact on clinical decision-making and patient outcomes. Second, although tumor stage and treatment-specific variables were intentionally excluded to enhance feasibility during early ICU admission, their absence may limit the characterization of certain oncology-related risk dimensions. Future studies incorporating dynamic oncologic information may further refine model performance. Third, calibration performance declined in some external cohorts, particularly in the BAKIT dataset, indicating that population-specific recalibration may be necessary prior to clinical deployment. Fourth, AKI was defined using KDIGO creatinine and urine output criteria, which may fail to capture subclinical kidney injury detectable by emerging biomarkers. Finally, the model utilizes data from the first 24 hours of ICU admission to predict AKI within 72 hours; while this window aligns with the need for early intervention, it does not capture dynamic physiological changes occurring beyond the initial 24-hour period. Furthermore, for patients with hyperacute kidney deterioration occurring shortly after ICU admission, the boundary between true early prediction and contemporaneous recognition of incipient AKI may narrow; this trade-off between timeliness and lead-time is inherent to early-warning models in critical care.
Conclusion
In summary, we developed and externally validated an interpretable XGBoost-based model for early AKI risk stratification in critically ill patients with digestive system tumors. By leveraging routinely available ICU data and explainable artificial intelligence techniques, the model achieves a balance among predictive performance, transparency, and clinical feasibility. Although model performance varied across populations, these findings support the potential utility of this approach for early risk identification and targeted prevention. They also emphasize the necessity of local validation and recalibration prior to widespread clinical adoption.
Supplemental material
Supplemental material - Multicenter validation of an explainable machine learning model for early prediction of acute kidney injury in critically ill patients with digestive system tumors
Supplemental material for Multicenter validation of an explainable machine learning model for early prediction of acute kidney injury in critically ill patients with digestive system tumors by DunZhu Guo, Jing Bai, Jian Zhang, Xiuming Xi, YuJuan Chen, ZhiPeng Luo, Kai Feng, JiangWei Zeng, MengXin Zhang, WeiQin Dong, XinXin Xu, Rui Wang, Yu Zhang in DIGITAL HEALTH
Footnotes
Acknowledgements
We extend our special thanks to our colleagues for their dedicated efforts in data acquisition and for providing valuable suggestions on this manuscript. We also gratefully acknowledge the assistance of artificial intelligence tools, which were used solely for language polishing and partial code optimization.
Ethical considerations
This study was conducted in accordance with the Declaration of Helsinki and relevant institutional and national ethical standards. The development cohort was obtained from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database, which contains fully de-identified data. Database access was granted after completion of the required ethics training (Certification No. 59209829), and informed consent was waived. External validation was conducted using the eICU Collaborative Research Database (eICU-CRD), the Beijing Acute Kidney Injury Clinical Trial (BAKIT) database, and the Tangshan Cohort Database for Tumor-Associated Acute Kidney Injury (TS-TAKI Cohort). The eICU-CRD data are fully de-identified with informed consent waived. Use of the BAKIT database was approved by the Ethics Committee of Fuxing Hospital, Capital Medical University (Approval No. 2010FXHEC–KF026). The TS-TAKI Cohort was approved by the institutional ethics committee (Approved No. RMYY-LLKS-2024039) and registered with the Chinese Clinical Trial Registry (ChiCTR2500103958). All data were anonymized, and no identifiable personal information was accessed.
Author contributions
G.D. and B.J. conceived and designed the study, performed data extraction, developed the prediction models, and drafted the manuscript.
Z.J., F.K., Z.J.W., Z.M.X., D.W.Q., X.X.X., and W.R. Were responsible for data cleaning, statistical analysis, and interpretation of the results.
X.X.M., C.Y.J., and L.Z.P. contributed to external validation data acquisition and provided methodological guidance.
Z.Y. conceived and supervised the study, reviewed and revised the manuscript, secured funding, and served as the corresponding author.
All authors read and approved the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Tangshan City Municipal-level Science and Technology Project (2025, Project No. 25120204C), the Hebei Province “333 Talent Project” (2024, Project No. C2024071), and the Hebei Provincial Medical Scientific Research Project Plan (2023, Project No. 20231803).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data are available from the corresponding author on reasonable request.
Supplemental material
Supplemental material for this article is available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
