Sage Journals: Discover world-class research

Abstract

Objectives

Sepsis-associated encephalopathy (SAE) is a prevalent complication among critically ill sepsis patients with poor outcomes. This study aimed to develop and validate an interpretable machine learning model for predicting mortality in SAE patients to support clinical decision-making.

Methods

The study utilized two large critical care databases: the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 3.1) database for model construction and internal validation, and the eICU Collaborative Research Database (eICU-CRD, version 2.0.1) for external validation. The XGBoost model was trained to predict patient mortality at 28 and 90 days after ICU admission, and performance was assessed by indicators including the area under the curve (AUC). To improve the interpretability of the model, SHapley Additive exPlanations (SHAP) analysis revealed key features affecting prognosis at both population and individual levels.

Results

A total of 4922 SAE patients were included, with 39 variables selected for model development. The XGBoost model performed best, with internal validation AUCs of 0.930 (95% CI: 0.917–0.942) and 0.906 (95% CI: 0.891–0.919) for 28-day and 90-day predictions, respectively. Notably, external validation achieved AUCs of 0.771 (95% CI: 0.748–0.792) and 0.759 (95% CI: 0.736–0.782), respectively. The Shapley value analytical framework was systematically applied to decode feature importance patterns and illuminate individual prognostic determinants.

Conclusions

Validated across two large critical care databases, the interpretable XGBoost model serves as a reliable tool for mortality prediction in SAE patients, which may help clinicians identify high-risk SAE patients early and optimize management to improve patient outcomes.

Keywords

Sepsis-associated encephalopathy sepsis mortality MIMIC-IV database machine learning SHAP

Introduction

Sepsis-associated encephalopathy (SAE) is the most prevalent form of secondary brain dysfunction in sepsis patients, with an incidence rate of over 70%.^1,2 Its pathogenesis is driven by microglia-mediated neuroinflammation induced by heat shock proteins, among other multifactorial mechanisms.³ Clinically, SAE manifests as a spectrum of neurological impairments, ranging from altered consciousness and cognitive decline to delirium and coma, all of which substantially worsen patient outcomes.^4,5 SAE is associated with significantly higher mortality compared to sepsis without encephalopathy, with rates reported between 50% and 70%, particularly among elderly patients and those with multiple organ dysfunction.⁶ Beyond acute mortality, SAE leads to persistent neurological deficits that cause long-term cognitive impairments and a diminished quality of life,⁷ making it both a major determinant of short-term mortality and a critical predictor of long-term prognosis in sepsis patients.⁸

Accurate mortality prediction in SAE patients is essential for guiding clinical decisions, as it facilitates early, targeted intervention in high-risk patients. Although widely used, conventional scores such as APACHE II and SOFA demonstrate limited predictive performance in this population.^9,10 Thus, there is an urgent need for the development of more precise and personalized prognostic models to improve risk stratification and clinical management strategies.

Recent machine learning (ML) models have made substantial contributions to the medical field, especially in disease prediction and prognosis. These models are adept at handling high-dimensional, nonlinear clinical data, revealing intricate relationships missed by traditional statistics.^11,12 Furthermore, the integration of explainable artificial intelligence (XAI) techniques, such as SHapley Additive exPlanations (SHAP), has greatly enhanced model transparency by elucidating the contribution of individual features to prediction outcomes.¹³ By revealing the decision-making process, this interpretability fosters clinician trust and facilitates the creation of personalized treatments, thereby maximizing the clinical utility of ML.

In this study, we aimed to develop and externally validate an explainable ML model to predict mortality in SAE patients. Additionally, we sought to clarify the importance of various features and to elucidate the model's workings through the SHAP method.

Methods

Data source

In this retrospective cohort study, data were obtained from two large-scale public databases. The Medical Information Mart for Intensive Care IV (MIMIC-IV, version 3.1) and the eICU Collaborative Research Database (eICU-CRD, version 2.0.1).^14–16 The MIMIC-IV database was used for model development and internal validation. It includes clinical data from over 380,000 patients admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts, between 2008 and 2019. For external validation, the eICU-CRD dataset, which includes clinical data from over 200,000 ICU admissions across 208 US hospitals between 2014 and 2015, was utilized. Both databases provide comprehensive patient information, including demographics, vital signs, laboratory results, and treatment details.

The study was approved by the Institutional Review Boards (IRBs) of Beth Israel Deaconess Medical Center (Boston, Massachusetts) and the Massachusetts Institute of Technology (Cambridge, Massachusetts). Individual patient consent was waived, as all protected health information was de-identified. Access to the databases was granted to individuals who completed the Collaborative Institutional Training Initiative (CITI) examinations. One author (Ziyi Wang) obtained access to both databases and was responsible for data extraction (Certification Number: 66982272).

Participant selection

At present, there is no unified diagnostic standard for SAE. In this study, SAE was defined as sepsis that met the Sepsis-3 criteria upon admission to the ICU, combined with Glasgow Coma Scale (GCS) <15 or delirium within the first 24 h of ICU admission.^8,17

Exclusion criteria included^6,8^17–19: (1) age < 18 years; (2) not the first ICU admission; (3) ICU stay duration < 24 hours; (4) presence of primary brain injury (e.g. traumatic brain injury, ischemic stroke, hemorrhagic stroke, epilepsy, or intracranial infection); (5) mental disorders; (6) long-term alcohol abuse or drug addiction; (7) metabolic encephalopathy, hepatic encephalopathy, hypertensive encephalopathy, hypoglycemic coma, and other liver and kidney diseases that affect consciousness; (8) acute or chronic liver disease; and (9) severe electrolyte or glucose disturbances, including hyponatremia (<120 mmol/L), hyperglycemia (>180 mg/dL), or hypoglycemia (<54 mg/dL), or partial pressure of carbon dioxide (PaCO₂) ≥ 80 mmHg.

Observation indicators

In this study, we used Navicat Premium (version 17.0) to extract data on patients admitted to the ICU within the first 24 hours from two databases using structured query language (SQL). Six key categories of characteristics from the first day of ICU admission were included (1) demographics: age, gender, weight, and race; (2) ICU length of stay (Los icu); (3) scale scores: Acute Physiology and Chronic Health Evaluation II (APACHE II), the Sequential Organ Failure Assessment (SOFA) score and the Glasgow Coma Scale (GCS) score; (4) treatment: mechanical ventilation, hemodialysis continuous, continuous renal replacement therapy (CRRT), and vasoactive agent use; (5) vital signs: heart rate, systolic blood pressure (SBP), diastolic blood pressure (DBP), mean blood pressure (MDP), respiratory rate, temperature, and percutaneous arterial oxygen saturation (SpO₂); (6) laboratory indicators: white blood cells (WBC), creatinine, blood urea nitrogen (BUN), platelets, calcium, chloride, hemoglobin, potassium, lactate, and glucose. For variables with multiple measurements within 24 hours, both the maximum and minimum values were considered.^10,20 To reduce bias from missing data, variables with more than 20% missing values were excluded (Supplemental Figure S1), while others were handled using the multiple imputation by chained equations (MICE) method. The primary outcome measure was 28-day mortality in patients with SAE, and the secondary outcome measure was 90-day mortality.

Establishment and validation of the prediction models

The Boruta algorithm is a feature selection technique designed to identify the most influential features affecting a target variable within a dataset.²¹ It works by generating g selection technique designed to identify the most influential features affecting a target variable within a datasety. glucose. For vscore for each feature. A feature is deemed important if its Z-score significantly exceeds the maximum Z-score of the shadow features over multiple independent tests.

For this study, the MIMIC-IV dataset was divided into training and internal validation sets using five-fold cross-validation, while the external validation set was derived from the eICU-CRD dataset. To predict 28-day and 90-day mortality in SAE patients, we employed eight ML models: XGBoost, AdaBoost, decision tree, K-nearest neighbors (KNNs), multi-layer perceptron (MLP), Naive Bayes (NB), support vector machine (SVM), and logistic regression (LR). To optimize the prediction model, we employed cross-validation grid search to optimize the hyperparameters of each algorithm, and applied both undersampling and synthetic minority over-sampling technique (SMOTE) resampling methods to address the issue of data imbalance. Model performance was assessed using a variety of metrics, including the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, specificity, accuracy, and F1 score. The model exhibiting the highest AUC was selected as the best model. Additionally, decision curve analysis (DCA) was employed to evaluate clinical utility, and a clinical impact curve (CIC) was generated to determine the optimal threshold probability for our model.

Statistical analysis

For continuous variables, the data are expressed as the mean ± standard deviation when normally distributed. For non-normal distributions, the data are expressed as the median with the interquartile range (IQR). Categorical variables are expressed as numbers and percentages. Continuous variable comparisons were performed using the t-test or the Wilcoxon rank-sum test, and categorical data were analyzed using the chi-square test or Fisher's exact test.

While ML models are powerful predictors, their lack of interpretability remains a barrier to clinical use. Our study leveraged the SHAP method to illuminate these hese ML models are.¹³ By computing the Shapley values for each feature, SHAP can quantify its specific contribution to individual predictions. This provides clinicians with intuitive visual representations of how and to what extent each factor influences the result, building faith in the model's decision-making process (Supplemental Method S1).

Statistical significance was determined using a two-tailed P-value < 0.05. All statistical analyses were conducted using R software (version 4.4.2) and Python (version 3.7.2).

Results

The inclusion and exclusion criteria for patient selection are illustrated in Figure 1. This study enrolled a total of 4922 patients with SAE, comprising 2027 patients from the MIMIC-IV database and 2895 patients from the eICU-CRD database. Based on their survival status at 28 or 90 days, the patients were categorized into two groups: “survival” and “non-survival.”

Figure 1.

Patients recruitment flowchart. GCS: Glasgow coma scale; SAE: sepsis-associated encephalopathy; SpO₂: percutaneous arterial oxygen saturation.

Baseline characteristics

The differences in characteristics between the survival and non-survival groups at 28 days in the MIMIC-IV dataset are summarized in Table 1, with a 28-day mortality of 17.67% (n = 358). Univariate analysis revealed significant differences between the groups for the following variables: age, gender, weight, los_icu, APACHE II, SOFA, GCS, mechanical ventilation, CRRT, and various physiological parameters, including minimum and maximum values for heart rate, SBP, DBP, respiratory rate, temperature, SpO₂, WBC, creatinine, BUN, calcium, lactate, and glucose, as well as the minimum values of MBP and potassium, and the maximum values of temperature, platelets, and hemoglobin (P < 0.05).

Table 1.

Baseline characteristics of the SAE patients in the MIMIC-IV dataset.

Variable	Survival	Non-survival	P_Value
Demographics
Age, years	72.00 (18.00–101.00)	81.00 (20.00–98.00)	0.00
Gender			0.01
Female	687 (33.89)	181 (8.93)
Male	982 (48.45)	177 (8.73)
Weight	79.80 (67.60–94.40)	70.95 (60.76–83.45)	0.00
Race			0.44
White	1225 (60.43)	257 (12.68)
Hispanic	32 (1.58)	4 (0.20)
Asian	48 (2.37)	7 (0.35)
Black	106 (5.32)	24 (1.18)
Other	258 (12.73)	66 (3.26)
Los_icu, days	2.23 (1.00–74.31)	3.33 (1.00–26.94)	0.00
Scale scores
APACHE II	22.00 (18.00–26.00)	23.00 (19.00–27.75)	0.00
SOFA	5.00 (0.00–22.00)	6.00 (1.00–20.00)	0.00
GCS	14.00 (3.00–15.00)	13.00 (3.00–15.00)	0.00
Treatment
Mechanical ventilation	918 (55.00)	170 (47.49)	0.01
Hemodialysis	39 (2.34)	9 (2.51)	0.99
Continuous renal replacement therapy	45 (2.70)	25 (6.98)	0.00
Vasoactive agent use	811 (48.59)	185 (51.6)	0.32
Vital signs
Heart rate min, min⁻¹	68.00 (24.00–131.00)	75.00 (31.00–124.00)	0.00
Heart rate max, min⁻¹	100.00 (42.00–186.00)	111.00 (55.00–191.00)	0.00
SBP min, mmHg	88.00 (37.00–149.00)	84.00 (26.00–137.00)	0.00
SBP max, mmHg	143.00 (92.00–236.00)	140.00 (102.00–311.00)	0.00
DBP min, mmHg	44.00 (10.00–87.00)	41.00 (11.00–82.00)	0.00
DBP max, mmHg	80.00 (45.00–218.00)	85.00 (48.00–228.00)	0.00
MBP min, mmHg	57.50 (2.00–99.00)	53.00 (1.00–92.00)	0.00
MBP max, mmHg	98.00 (66.00–296.00)	97.00 (58.00–283.00)	0.15
Resprate min, min⁻¹	12.00 (1.00–30.00)	13.00 (2.00–30.00)	0.00
Resprate max, min⁻¹	27.00 (14.00–69.00)	30.00 (12.00–60.00)	0.00
Temperature min, °C	36.33 (32.10–38.50)	36.28 (33.39–37.50)	0.26
Temperature max, °C	37.39 (32.94–41.00)	37.17 (35.89–40.00)	0.00
SpO₂ min, %	93.00 (40.00–100.00)	90.00 (55.00–100.00)	0.00
SpO₂ max, %	100.00 (94.00–100.00)	100.00 (94.00–100.00)	0.00
Laboratory indicators
WBC max, K/µL	15.20 (0.40–257.40)	17.00 (0.20–164.20)	0.01
WBC min, K/µL	6.50 (0.00–66.30)	8.90 (0.10–104.70)	0.00
Creatinine max, mg/dL	1.20 (0.30–32.30)	1.77 (0.20–10.80)	0.00
Creatinine min, mg/dL	0.70 (0.00–8.30)	1.00 (0.10–8.50)	0.00
BUN max, mg/dL	26.00 (3.00–186.00)	45.00 (3.00–186.00)	0.00
BUN min, mg/dL	12.00 (1.00–127.00)	25.50 (3.00–124.00)	0.00
Platelets max, K/µL	298.00 (21.00–1592.00)	235.00 (14.00–1060.00)	0.00
Platelets min, K/µL	135.00 (5.00–1220.00)	144.00 (5.00–808.00)	0.72
Calcium max, mg/dL	8.90 (7.20–15.60)	8.70 (6.50–13.70)	0.00
Calcium min, mg/dL	7.86 (0.00–10.00)	7.70 (4.40–10.40)	0.00
Chloride max, mmol/L	109.00 (89.00–137.00)	109.00 (82.00–130.00)	0.00
Chloride min, mmol/L	98.00 (74.00–118.00)	100.00 (76.00–117.00)	0.00
Hemoglobin max, g/dL	12.20 (7.30–20.10)	10.70 (6.10–17.90)	0.00
Hemoglobin min, g/dL	8.40 (0.00–15.50)	8.60 (4.40–15.50)	0.60
Potassium max, mmol/L	5.00 (3.30–14.70)	5.00 (2.60–10.00)	0.31
Potassium min, mmol/L	3.40 (1.40–5.10)	3.60 (2.10–6.30)	0.00
Lactate max, mmol/L	2.00 (−2.14–20.20)	2.20 (−0.43–20.20)	0.00
Lactate min, mmol/L	1.28 (0.20–4.10)	1.35 (0.40–9.00)	0.00
Glucose max, mg/dL	147.00 (62.00–180.00)	134.00 (68.00–180.00)	0.00
Glucose min, mg/dL	86.00 (21.00–157.00)	90.00 (25.00–168.00)	0.00

SAE: sepsis-associated encephalopathy; MIMIC-IV: Medical Information Mart for Intensive Care IV; APACHE II: acute physiology and chronic health evaluation II; SOFA: sequential organ failure assessment; GCS: Glasgow coma scale; SBP: systolic blood pressure; DBP: diastolic blood pressure; MDP: mean blood pressure; SpO₂: percutaneous arterial oxygen saturation; WBC: white blood cells; BUN: blood urea nitrogen.

Feature selection

The results of feature selection using the Boruta algorithm are shown in Figure 2. The 39 variables most strongly associated with 28-day mortality, ranked by their Z-values, include los_icu, SOFA, age, weight, GCS, CRRT, APACHE II, along with the maximum and minimum values of platelets, hemoglobin, calcium, BUN, lactate, creatinine, heart rate, chloride, WBC, potassium, glucose, respiratory rate, SpO₂, DBP, and SBP. Additionally, minimum MBP and maximum temperature were also selected.

Figure 2.

Feature selection based on the Boruta algorithm. The horizontal axis denotes the name of each variable, while the vertical axis illustrates the Z-value corresponding to each variable. The box plot displays the Z-values for every variable throughout the model computation. Green boxes indicate the top 36 significant variables, yellow boxes represent provisional attributes, and red boxes signify variables deemed unimportant. APACHE II: acute physiology and chronic health evaluation II; SOFA: sequential organ failure assessment; GCS: Glasgow coma scale; CRRT: continuous renal replacement therapy; SBP: systolic blood pressure; DBP: diastolic blood pressure; MDP: mean blood pressure; SpO₂: percutaneous arterial oxygen saturation; WBC: white blood cells; BUN: blood urea nitrogen; Los icu: ICU length of stay.

Figure 3.

Comparison of AUCs among eight machine learning models. AUCs for the prediction of 28-day (a) and 90-day mortality (b) in the internal validation cohort. AUCs for the prediction of 28-day (c) and 90-day mortality (d) in the external validation cohort. AUC: area under the receiver operating characteristic curve.

Model performance comparisons

The accuracy, recall, specificity, precision, and F1 score for the eight ML models are presented in Table 2. We first evaluated the predictive performance of the APACHE II and SOFA scores. During internal validation, the AUC of APACHE II was 0.562 (95% CI: 0.529–0.597) for 28-day and 0.611 (95% CI: 0.584–0.637) for 90-day mortality, while SOFA achieved AUCs of 0.634 (95% CI: 0.603–0.670) and 0.638 (95% CI: 0.610–0.664), respectively. Performance in the external validation cohort was similarly limited, with APACHE II showing AUCs of 0.537 (95% CI: 0.509–0.568) for 28-day and 0.607 (95% CI: 0.581–0.633) for 90-day mortality, and SOFA achieving 0.602 (95% CI: 0.572–0.630) and 0.641 (95% CI: 0.615–0.666), respectively.

Table 2.

Performance metrics for predicting 28-day and 90-day mortality of SAE patients in the internal and external validation cohorts.

Internal validation cohort
28-day mortality
Models	Accuracy	Recall	Specificity	Precision	F1	AUC (95% CI)
APACHE II	0.823	0.823	1.000	0.678	0.744	0.562 (0.529–0.597)
SOFA	0.829	0.829	0.996	0.811	0.761	0.634 (0.603–0.670)
AdaBoost	0.886	0.886	0.951	0.879	0.881	0.903 (0.885–0.920)
Decision tree	0.761	0.761	0.779	0.829	0.782	0.779 (0.751–0.804)
K-nearest neighbors	0.840	0.840	0.893	0.846	0.842	0.841 (0.819–0.863)
Logistic regression	0.882	0.882	0.955	0.874	0.875	0.899 (0.881–0.917)
Multi-layer perceptron	0.784	0.784	0.785	0.858	0.806	0.873 (0.852–0.893)
Naive Bayes	0.836	0.836	0.869	0.856	0.844	0.864 (0.843–0.883)
Support vector machine	0.812	0.812	0.829	0.856	0.826	0.858 (0.836–0.877)
XGBoost	0.889	0.889	0.959	0.882	0.882	0.930 (0.917–0.942)
90-day mortality
Models	Accuracy	Recall	Specificity	Precision	F1	AUC (95% CI)
APACHE II	0.831	0.831	0.999	0.790	0.758	0.611 (0.584–0.637)
SOFA	0.831	0.831	1.000	0.748	0.757	0.638 (0.610–0.664)
AdaBoost	0.839	0.839	0.921	0.833	0.834	0.884 (0.867–0.900)
Decision tree	0.747	0.747	0.771	0.784	0.759	0.781 (0.756–0.804)
K-nearest neighbors	0.795	0.795	0.867	0.796	0.795	0.815 (0.794–0.836)
Logistic regression	0.849	0.849	0.935	0.842	0.842	0.883 (0.864–0.898)
Multi-layer perceptron	0.781	0.781	0.785	0.822	0.793	0.854 (0.835–0.873)
Naive Bayes	0.815	0.815	0.872	0.818	0.816	0.854 (0.836–0.872)
Support vector machine	0.801	0.801	0.830	0.820	0.808	0.845 (0.826–0.864)
XGBoost	0.858	0.858	0.937	0.852	0.852	0.906 (0.891–0.919)
External validation cohort
28-day mortality
Models	Accuracy	Recall	Specificity	Precision	F1	AUC (95% CI)
APACHE II	0.754	0.754	1.000	0.570	0.649	0.537 (0.509–0.568)
SOFA	0.764	0.764	0.994	0.763	0.677	0.602 (0.572–0.630)
AdaBoost	0.828	0.828	0.944	0.799	0.806	0.735 (0.711–0.760)
Decision tree	0.685	0.685	0.697	0.798	0.720	0.710 (0.685–0.735)
K-nearest neighbors	0.634	0.634	0.655	0.764	0.677	0.633 (0.606–0.660)
Logistic regression	0.691	0.691	0.694	0.811	0.727	0.754 (0.730–0.779)
Multi-layer perceptron	0.651	0.651	0.649	0.797	0.693	0.707 (0.680–0.733)
Naive Bayes	0.737	0.737	0.782	0.794	0.758	0.725 (0.700–0.749)
Support vector machine	0.627	0.627	0.614	0.799	0.673	0.705 (0.677–0.730)
XGBoost	0.698	0.698	0.697	0.819	0.734	0.771 (0.748–0.792)
90-day mortality
Models	Accuracy	Recall	Specificity	Precision	F1	AUC (95% CI)
APACHE II	0.821	0.821	0.999	0.779	0.743	0.607 (0.581–0.633)
SOFA	0.821	0.821	0.998	0.786	0.745	0.641(0.615–0.666)
AdaBoost	0.816	0.816	0.935	0.787	0.794	0.729 (0.705–0.753)
Decision tree	0.642	0.642	0.655	0.767	0.680	0.659 (0.634–0.686)
K-nearest neighbors	0.666	0.666	0.703	0.757	0.698	0.639 (0.612–0.665)
Logistic regression	0.688	0.688	0.694	0.799	0.722	0.747 (0.723–0.770)
Multi-layer perceptron	0.643	0.643	0.644	0.781	0.682	0.689 (0.664–0.715)
Naive Bayes	0.738	0.738	0.793	0.780	0.754	0.717 (0.690–0.742)
Support vector machine	0.648	0.648	0.645	0.788	0.688	0.698 (0.673–0.723)
XGBoost	0.693	0.693	0.693	0.808	0.726	0.759 (0.736–0.782)

SAE: sepsis-associated encephalopathy; APACHE II: acute physiology and chronic health evaluation II; SOFA: sequential organ failure assessment; AUC: area under the receiver operating characteristic curve.

In contrast, all eight ML models demonstrated strong predictive performance (Figure 3). The XGBoost model achieved the highest discriminative ability, with AUCs of 0.930 (95% CI: 0.917–0.942) for 28-day mortality and 0.906 (95% CI: 0.891–0.919) for 90-day mortality, representing a 65.8% and 48.3% increase, respectively, over the APACHE II score, and a 46.7% and 42.0% increase over the SOFA score. Other ML models, including AdaBoost (28-day: 0.903; 90-day: 0.884) and LR (28-day: 0.899; 90-day: 0.883), also performed strongly. In contrast, simpler models such as decision tree and KNN showed more moderate performance, with AUCs ranging between 0.779 and 0.841 for 28-day and 0.781 and 0.815 for 90-day predictions. In the external validation cohort, predictive performance declined overall but remained acceptable for several models. XGBoost again yielded the highest AUCs: 0.771 (95% CI: 0.748–0.792) for 28-day mortality and 0.759 (95% CI: 0.736–0.782) for 90-day mortality, corresponding to a 43.6% and 25.0% improvement over APACHE II, and 28.1% and 18.4% over SOFA, respectively. LR was the second-best performer in this setting, with AUCs of 0.754 and 0.747, respectively. Model performance was lowest for KNN and decision tree, with AUC values between 0.633 and 0.710 across timepoints.

Given the class imbalance inherent in mortality outcomes, we further evaluated model performance using the F1 score (Table 2). The XGBoost model consistently achieved the highest F1-scores in internal validation (28-day: 0.882; 90-day: 0.852) and competitive scores in external validation.

The DCA curves for the eight prediction models are shown in Figure 4. In both datasets, the XGBoost model provided a significantly higher net benefit than the other models. Furthermore, the CIC curve in Figure 5 was used to intuitively determine the optimal risk threshold for clinical application. A threshold of 0.6 was found to optimally balance sensitivity and specificity for identifying high-risk mortality, and was, therefore, chosen to inform clinical decision-making.

Figure 4.

Decision curve analysis (DCA) for different machine learning models. DCA curves for 28-day (a) and 90-day mortality (b) in the internal validation set. DCA curves for 28-day (c) and 90-day mortality (d) in the external validation set.

Figure 5.

Clinical impact curves (CICs) are used to evaluate the clinical utility of different models. CICs for 28-day (a) and 90-day mortality (b) in the internal validation set. CICs for 28-day (c) and 90-day mortality (d) in the external validation set. In CICs, the horizontal axis represents the risk threshold and its corresponding cost–benefit ratio, while the vertical axis compares the number of positive patients identified by the model to the actual number of true-positive patients.

Subgroup analysis

In the subgroup analysis, we evaluated the performance of the XGBoost model across different age groups. Table 3 presents the detailed results of the subgroup analysis. Compared with patients under 65 years old, the model demonstrated better performance in predicting prognosis for patients aged 65 years or older.

Table 3.

Summary of the performance analysis of the XGBoost model in age subgroups.

Cohort and subgroup	Timepoint	AUC (95% CI)
Internal validation	28-day
Age
<65		0.915 (0.898–0.933)
≥65		0.931 (0.885–0.949)
Internal validation	90-day
Age
<65		0.878 (0.834–0.916)
≥65		0.901 (0.883–0.917)
External validation	28-day
Age
<65		0.717 (0.665–0.765)
≥65		0.763 (0.736–0.789)
External validation	90-day
Age
<65		0.717 (0.666–0.764)
≥65		0.766 (0.740–0.792)

AUC: area under the receiver operating characteristic curve.

Interpretability analysis

In both the 28-day and 90-day mortality groups, the XGBoost model achieved the highest AUCs across all models, establishing it as the best model. We utilized SHAP analysis to interpret the model's predictions. Figure 6(a) and (c) displays the top 20 predictors of 28-day and 90-day mortality in the external validation set. Figure 6(b) and (d) provides a more detailed display of the positive and negative relationships between features and results. The key predictors common to both time points include lowest BUN, highest calcium, highest chloride, lowest chloride, GCS, highest glucose, lowest glucose, highest heartrate, highest hemoglobin, Los icu, highest platelets, highest temperature, lowest WBC, and weight. Figure 7 demonstrates a prediction result generated by the XGBoost model for a patient with SAE who died within 28 days of ICU admission. Features marked with red arrows increase the risk of mortality, while those with blue arrows decrease the risk. By integrating the collective contributions of these factors, the model calculated a mortality probability of 77% for this patient, corresponding to the output value f(x) shown in the figure.

Figure 6.

Visually interpret the XGBoost model using SHAP. Feature-ranking plots (a) and summary plots (b) for predicting 28-day mortality SAE. Feature-ranking plots (c) and summary plots (d) for predicting 90-day mortality in SAE. (e) Inference process of the model with (c) a non-surviving patient. BUN: blood urea nitrogen; WBC: white blood cells; Los icu: ICU length of stay; SpO₂: percutaneous arterial oxygen saturation; GCS: Glasgow coma scale; DBP: diastolic blood pressure; SBP: systolic blood pressure; SHAP: SHapley Additive exPlanations; SAE: sepsis-associated encephalopathy.

Figure 7.

Inference process of the model with a non-surviving patient. GCS: Glasgow coma scale; SpO₂: percutaneous arterial oxygen saturation; Los icu: ICU length of stay; WBC: white blood cells.

Discussion

SAE is a severe complication of sepsis, contributing to high mortality rates. This study leveraged two large public databases, MIMIC-IV and eICU-CRD, to develop predictive models for estimating 28-day and 90-day mortality in SAE patients. Eight ML algorithms were utilized to develop a predictive model, incorporating 39 clinical variables recorded within the first 24 hours of ICU admission.

Previous studies have attempted to develop SAE mortality prediction models. Some studies relied on traditional LR methods, despite achieving an average AUC of 0.799, and often lack external validation and model interpretability.^8,19,22 Several recent ML approaches have also been proposed, yet most focus on a single mortality endpoint, thereby overlooking the dynamic progression and individualized risk trajectories of SAE patients.^17,18 In contrast, our study introduces an interpretable multi-timepoint ML framework that not only predicts both 28-day and 90-day mortality but also incorporates SHAP-based explanations to clarify feature contributions. This study has the following contributions.

First, we developed predictive models using eight ML algorithms capable of handling high-dimensional, nonlinear clinical data, uncovering hidden relationships, and improving predictive accuracy. To ensure robustness, we performed hyperparameter tuning via grid search cross-validation and addressed class imbalance with undersampling and SMOTE. Among the models, the XGBoost model demonstrated superior performance, outperforming the others in discriminative power, accuracy, and robustness. Its ensemble learning approach effectively captures nonlinear relationships while mitigating overfitting, making it particularly well-suited for clinical applications. In contrast, traditional scoring systems such as APACHE II and SOFA showed limited predictive performance for SAE mortality, likely due to their insensitivity to subtle clinical changes and inability to capture nonlinear risk factor interactions.

Second, our model simultaneously predicts mortality at both 28 and 90 days, offering a comprehensive risk profile that supports clinical decision-making across different time horizons. The 28-day mortality prediction directly correlates with acute neuroinflammatory cascades, enabling clinicians to prioritize interventions such as EEG monitoring and immunomodulatory therapies during the critical early phase.^23,24 In contrast, the 90-day mortality prediction reflects the long-term consequences of SAE, including persistent neuroinflammation and sepsis-induced immunosuppression, which are associated with secondary infections and delayed cognitive decline.^3,25 This dual timepoint approach aligns with the biphasic nature of SAE pathophysiology, thereby bridging the gap between acute management and chronic care planning.

Furthermore, we employed SHAP to interpret the XGBoost model's decision-making process, thereby enhancing its clinical transparency and trustworthiness. Analysis revealed that the five most influential features, including minimum BUN, minimum WBC, maximum hemoglobin, Los icu, and weight. For instance, high BUN signals renal dysfunction and impaired perfusion.²⁶ A decrease in WBC indicates immune paralysis, predisposing to secondary infections and uncontrolled neuroinflammation. Abnormal elevation of hemoglobin suggests hemoconcentration and microcirculatory failure, which can exacerbate cerebral hypoxia.²⁷ A prolonged ICU stay often signifies irreversible organ dysfunction and heightened exposure to iatrogenic risks. Thus, SHAP analysis not only elucidates the predictive process but also provides evidence for the underlying pathophysiological mechanisms influencing SAE outcomes.

Finally, subgroup analysis revealed that the model performed more effectively in patients aged 65 years or older, exhibiting a significantly higher AUC compared to younger patients. This age-related disparity may be attributed to the more distinct physiological characteristics and higher baseline mortality risk among older adults, which likely enhances the model's discriminative capacity. Furthermore, we plotted the DCA and CIC curves to confirm the superior clinical utility of the XGBoost model. Collectively, these findings demonstrate that the model not only accurately identifies high-risk SAE patients but also offers a tangible foundation for guiding clinical decisions, with the potential to optimize treatment planning and ultimately improve patient outcomes.

Although this study has achieved significant results, there are still some limitations. Firstly, the data used in this study were all sourced from public databases, and the cohort was restricted to SAE patients; the model's generalizability to broader ICU populations may be limited. Future studies should incorporate multi-center data, including from private hospitals, and expand inclusion criteria to improve applicability. Second, despite including a wide range of variables, potential confounders such as sedation use, specific infection sites, and detailed treatment responses were not consistently available, which may affect outcome interpretation. Third, the decline in performance during external validation suggests the need for strategies such as domain adaptation or larger, more diverse samples. Future work will focus on real-time implementation and prospective validation in multicenter ICUs to further assess clinical utility.

Conclusion

In summary, this study develops and validates an interpretable, multi-timepoint ML model that significantly improves mortality prediction for SAE patients over conventional scoring systems. Future research should focus on refining the model, incorporating additional risk factors, and validating its performance across diverse patient populations to enhance its generalizability and clinical applicability.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251393281 - Supplemental material for Construction and validation of a mortality prediction model for patients with sepsis-associated encephalopathy: Interpretable machine learning approach

Supplemental material, sj-docx-1-dhj-10.1177_20552076251393281 for Construction and validation of a mortality prediction model for patients with sepsis-associated encephalopathy: Interpretable machine learning approach by Ziyi Wang, Lingling Ge, Yongkui Zhu, Xiaoli Chen and Xia Cheng in DIGITAL HEALTH

Footnotes

ORCID iD

Ziyi Wang

Ethical statement

This study was approved by the Institutional Review Boards (IRBs) of Beth Israel Deaconess Medical Center (Boston, Massachusetts) and the Massachusetts Institute of Technology (Cambridge, Massachusetts). Individual patient consent was waived, as all protected health information was de-identified. Access to the databases was granted to individuals who completed the Collaborative Institutional Training Initiative (CITI) examinations. One author (Ziyi Wang) obtained access to both databases and was responsible for data extraction (Certification Number: 66982272).

Authors’ contributions

Ziyi Wang: conceptualization, data curation, formal analysis, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing–original draft, and writing–review and editing. Lingling Ge: conceptualization, writing–original draft, and writing–review and editing. Yongkui Zhu: conceptualization, writing–original draft, and writing–review and editing. Xiaoli Chen: conceptualization, project administration, supervision, and writing–review and editing. Xia Cheng: conceptualization, project administration, supervision, and writing–review and editing.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The data are open-source and freely available at https://www.physionet.org/content/mimiciv/3.1/ and https://www.physionet.org/content/eicu-crd-demo/2.0.1/. All original code has been deposited at .

Guarantor

Ziyi Wang

Supplemental material

Supplemental material for this article is available online.

References

Andonegui

Zelinski

Schubert

, et al. Targeting inflammatory monocytes in sepsis-associated encephalopathy and long-term cognitive impairment. JCI insight 2018; 3: e99364.

Xing

Huang

, et al. Amantadine attenuates sepsis-induced cognitive dysfunction possibly not through inhibiting toll-like receptor 2. J Mol Med (Berlin, Germany) 2018; 96: 391–402.

Jing

Zuo

Fang

, et al. Erbin protects against sepsis-associated encephalopathy by attenuating microglia pyroptosis via IRE1α/Xbp1s-Ca(2+) axis. J Neuroinflammation 2022; 19: 237.

Pan

Pei

, et al. BML-111 reduces neuroinflammation and cognitive impairment in mice with sepsis via the SIRT1/NF-κB signaling pathway. Front Cell Neurosci 2018; 12: 267.

Dal-Pizzol

de Medeiros

Michels

, et al. What animal models can tell us about long-term psychiatric symptoms in sepsis survivors: a systematic review. Neurother: J Am Soc Exp NeuroTher 2021; 18: 1393–1413.

Shi

Jing

, et al. Prognostic analysis of elderly patients with pathogenic microorganisms positive for sepsis-associated encephalopathy. Front Microbiol 2024; 15: 1509726.

Yao

Chen

, et al. Increased resting-state functional connectivity of the hippocampus in rats with sepsis-associated encephalopathy. Front Neurosci 2022; 16: 894720.

Yang

Liang

Geng

, et al. Development of a nomogram to predict 30-day mortality of patients with sepsis-associated encephalopathy: a retrospective cohort study. J Intensive Care 2020; 8: 45.

Kang

Kim

, et al. Machine learning algorithm to predict mortality in patients undergoing continuous renal replacement therapy. Crit Care (London, England) 2020; 24: 42.

10.

Liu

Yeung

, et al. Illness severity assessment of older adults in critical illness using machine learning (ELDER-ICU): an international multicentre study with subgroup bias evaluation. The Lancet Digital Health 2023; 5: e657–e667.

11.

Sendak

Gao

Brajer

, et al. Presenting machine learning model information to clinical end users with model facts labels. NPJ Digital Med 2020; 3: 41.

12.

Liu

Wang

, et al. Early prediction of treatment response to neoadjuvant chemotherapy based on longitudinal ultrasound images of HER2-positive breast cancer patients by Siamese multi-task network: a multicentre, retrospective cohort study. EClinicalMedicine 2022; 52: 101562.

13.

Lundberg

Lee

S-I

. A Unified Approach to Interpreting Model Predictions. In: 31st conference on neural information processing systems(NIPS 2017), 2017.

14.

Johnson

AEW

Bulgarelli

Shen

, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data 2023; 10: 1.

15.

Pollard

Johnson

AEW

Raffa

, et al. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci Data 2018; 5: 180178.

16.

Goldberger

Amaral

Glass

, et al. Physiobank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 2000; 101: E215–E220.

17.

Guo

Cheng

Wang

, et al. Factor analysis based on SHapley Additive exPlanations for sepsis-associated encephalopathy in ICU mortality prediction using XGBoost - a retrospective study based on two large database. Front Neurol 2023; 14: 1290117.

18.

Peng

Yang

, et al. Machine learning approach for the prediction of 30-day mortality in patients with sepsis-associated encephalopathy. BMC Med Res Methodol 2022; 22: 183.

19.

Jin

Zhou

Chen

, et al. Comprehensive risk factor-based nomogram for predicting one-year mortality in patients with sepsis-associated encephalopathy. Sci Rep 2024; 14: 23979.

20.

Zhang

Hong

. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Critical Care (London, England) 2019; 23: 112.

21.

Kursa

Rudnicki

. Feature selection with the Boruta package. J Stat Softw 2010; 36: 1–13.

22.

Yang

Fei

, et al. A nomogram for predicting sepsis-associated delirium: a retrospective study in MIMIC III. BMC Med Inform Decis Mak 2023; 23: 184.

23.

Gao

Hernandes

. Sepsis-associated encephalopathy and blood-brain barrier dysfunction. Inflammation 2021; 44: 2143–2150.

24.

Zhou

, et al. Regulation of hippocampal neuronal apoptosis and autophagy in mice with sepsis-associated encephalopathy by immunity-related GTPase M1. CNS Neurosci Ther 2020; 26: 177–188.

25.

Chung

Wickel

Brunkhorst

, et al. Sepsis-associated encephalopathy: from delirium to dementia?. J Clin Med 2020; 9: 703.

26.

Fang

Tang

Gao

, et al. Association between blood urea nitrogen and delirium in critically ill elderly patients without kidney diseases: a retrospective study and Mendelian randomization analysis. CNS Neurosci Ther 2025; 31: e70201.

27.

Sheng

Zhang

, et al. Association between hemoglobin and in-hospital mortality in critically ill patients with sepsis: evidence from two large databases. BMC Infect Dis 2024; 24: 1450.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB