Sage Journals: Discover world-class research

Abstract

Objective

We identified predictive factors and developed a novel machine learning (ML) model for predicting mortality risk in patients with sepsis-associated encephalopathy (SAE).

Methods

In this retrospective cohort study, data from the Medical Information Mart for Intensive Care IV (MIMIC-IV) and eICU Collaborative Research Database were used for model development and external validation. The primary outcome was the in-hospital mortality rate among patients with SAE; the observed in-hospital mortality rate was 14.74% (MIMIC IV: 1112, eICU: 594). Using the least absolute shrinkage and selection operator (LASSO), we built nine ML models and a stacking ensemble model and determined the optimal model based on the area under the receiver operating characteristic curve (AUC). We used the Shapley additive explanations (SHAP) algorithm to determine the optimal model.

Results

The study included 9943 patients. LASSO identified 15 variables. The stacking ensemble model achieved the highest AUC on the test set (0.807) and 0.671 on external validation. SHAP analysis highlighted Glasgow Coma Scale (GCS) and age as key variables. The model (https://sic1.shinyapps.io/SSAAEE/) can predict in-hospital mortality risk for patients with SAE.

Conclusions

We developed a stacked ensemble model with enhanced generalization capabilities using novel data to predict mortality risk in patients with SAE.

Keywords

Sepsis-associated encephalopathy Medical Information Mart for Intensive Care IV eICU Collaborative Research Database stacking ensemble model mortality risk prediction

Introduction

Sepsis-associated encephalopathy (SAE) refers to a series of neurological symptoms observed in patients with sepsis, resulting from systemic inflammatory responses leading to abnormal central nervous system function. In SAE, clinical or standard laboratory examinations do not reveal direct evidence of central nervous system infection, structural abnormalities, or other types of brain diseases, such as hepatic encephalopathy.¹ Brain involvement is considered a consequence of sepsis progressing to a severe stage.² The pathological mechanisms of SAE remain unclear, potentially involving alterations in cerebral microcirculation, dysfunction of the blood–brain barrier, mitochondrial impairment, neurotransmitter dysfunction, or involvement of inflammatory mediators and the complement system, among others.³ SAE is the most prevalent type of neurological disorder in the intensive care unit (ICU), with over 50% of patients with sepsis experiencing encephalopathy.^2,4 Despite being a highly prevalent condition, definitive therapeutic approaches to treatment are lacking for SAE, which is characterized by a high mortality rate and poor prognosis. Early identification and management of patients with SAE at risk of death are crucial for averting severe complications and reducing mortality rates.⁵

To provide an exclusionary diagnosis for SAE, it is necessary to rule out pre-existing conditions such as chronic liver or renal failure, severe electrolyte imbalance, blood glucose disturbance, central nervous system infection, or pre-existing central nervous system diseases because specific biomarkers are lacking.⁶ Thus, the diagnosis of SAE is relatively challenging.⁷ As previously reported, Sequential Organ Failure Assessment (SOFA) and quick SOFA scores have been widely used as prognostic tools for sepsis and treatment of other infections in clinical practice.^8,9 However, these still have limitations in terms of discriminative power and predictive accuracy.¹⁰ Currently, there is a lack of mature tools or methods to assess the mortality risk in patients with SAE. Therefore, it is imperative to establish a novel model for the effective and accurate prediction of SAE outcomes.

In recent years, the healthcare industry has increasingly incorporated machine learning (ML) into various clinical scenarios, particularly for predicting outcomes in critically ill patients.^11–13 The application of ML in the medical field using extensive datasets and predictive models enables clinical practitioners to approach challenges in patient care with increased assurance.¹⁴ The aim of this study was to establish and validate ML models for predicting in-hospital mortality among patients with SAE.

Methods

Data source

We collected pertinent data from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database and eICU Collaborative Research Database for development and validation of the model. Data from MIMIC-IV originated from patients admitted to the ICU at Beth Israel Deaconess Medical Center, affiliated with Harvard Medical School in Boston, Massachusetts, USA, spanning the years 2008 to 2019. The eICU Collaborative Research Database is a multicenter database comprising anonymized health data from over 200,000 ICU admissions across the United States. The temporal scope of the data spans from 2014 to 2015. A researcher on our team successfully completed the Institutional Review Board (IRB) examination regarding the protection of human research participants (ID number 50618389) and obtained access credentials for these two databases. The MIMIC-IV database (version 1.0) is publicly accessible at https://physionet.org/content/mimiciv/1.0/. The eICU database is openly accessible at https://eICU-crd.mit.edu/about/eICU/.

All patient details have been de-identified to maintain confidentiality and privacy. The data presented in this study have undergone a thorough de-identification process to prevent any form of patient identification. The study protocol received approval from the Collaborative Institutional Training Initiative (CITI Program) Ethics Review Committee. The affiliated institution is the Massachusetts Institute of Technology Affiliates (ID: 1912); the approval number is 50618389, and the study approval was granted on 12 August 2022. Written informed consent for participation was not required for this study. The study complies with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines.¹⁵ This study was conducted in accordance with the principles of the 2013 Helsinki Declaration.

Study population

According to past references, we defined SAE as sepsis occurring upon admission to the ICU, accompanied by Glasgow Coma Scale (GCS) score <15, or delirium (International Classification of Diseases [ICD] codes 2930, 2931), or cognitive impairment (ICD codes 33183, G3184).¹⁶ We excluded patients with delirium caused by alcohol dependence, drug dependence, and other primary neurological disorders.

The inclusion criteria were (1) patients conforming to the diagnostic criteria of Sepsis-3;¹⁷ (2) those who met the definition of SAE; (3) age ≥18 years; (4) first hospitalization and initial admission to the ICU; (5) ICU stay >24 hours. The exclusion criteria were: (1) primary brain injury (such as head trauma, intracranial bleeding, cerebral contusion, skull fractures, cerebral embolism, ischemic stroke, epilepsy, intracranial infection, and other cerebrovascular diseases); (2) psychiatric disorders; (3) substance abuse, drug dependency, or alcohol dependence; (4) conditions affecting consciousness including hepatic coma, hypoglycemic coma, or hypertensive encephalopathy; (6) severe electrolyte disturbances or glucose abnormalities, including hyponatremia (<120 mmol/L), hyperglycemia (>180 mg/dL), or hypoglycemia (<54 mg/dL), as shown in the Supplementary materials.

Data collection and results

This study was a retrospective cohort investigation involving data collection across several dimensions: (1) demographic characteristics including age and sex; (2) antibiotic treatment; (3) vital signs on the first day including heart rate, respiratory rate, mean blood pressure, temperature, and arterial oxygen saturation (SpO₂); (4) scale scores on the first day in the ICU including SOFA score, systemic inflammatory response syndrome (SIRS) score, and GCS; (5) laboratory test results on the first day in the ICU including hematocrit, hemoglobin, platelets, white blood cells (WBC), anion gap, bicarbonate, blood urea nitrogen (BUN), calcium, glucose, chloride, creatinine, sodium, potassium, absolute (Abs) basophils, Abs eosinophils, Abs lymphocytes, Abs monocytes, Abs neutrophils, international normalized ratio (INR), prothrombin time, and partial thromboplastin time (PTT); (6) complications including myocardial infarction, congestive heart failure, chronic pulmonary disease, diabetes, hypertension, and septic shock; and (7) ICU length of stay. We used laboratory test results and blood biomarker levels measured within the first day of ICU admission. In cases where multiple measurements were taken on the initial day, we considered the minimum value for each indicator. The primary outcome event for this study was in-hospital mortality among patients with SAE.

Statistical analysis

We used multiple imputation to fill in missing values for the variables. Continuous variables were subjected to the Kolmogorov–Smirnov test to assess the distribution of the data. Parametric continuous variables were assessed using t-tests and are presented as mean and standard deviation. Non-parametric continuous variables were evaluated with the Mann–Whitney U test and are expressed as median with the interquartile range (IQR). Categorical variables are presented as number (percentage) and were assessed using the χ² test or Fisher’s exact test. All statistical tests were two-tailed. A significance level of P < 0.05 was considered statistically significant. Statistical analysis was performed using IBM SPSS 27.0 software (IBM Corp., Armonk, NY, USA). ML models were developed and validated using R software version 4.3.0 (www.r-project.org).

Feature selection

We used the least absolute shrinkage and selection operator (LASSO) for feature selection. LASSO is a regularization method frequently applied in linear regression and classification, incorporating an L1 regularization term in the loss function to automatically select features that exert a significant impact on the target variable.¹⁸ LASSO can mitigate overfitting to noise in the training set, thereby enhancing the model’s generalization capability and interpretability.¹⁹

Model development and validation

We used nine different ML algorithms as well as the stacking ensemble model algorithm. Using the stacking ensemble model, we selected the three top-performing models (elastic net [ENet] + support vector machine [SVM] + extreme gradient boosting [XGBoost]) from the pool of nine ML models and combined them into a meta-model for the computations. We ultimately developed 10 ML models: logistic regression (LR), decision tree (DT), ENet, K-nearest neighbor, light gradient boosting machine (LightGBM), random forest (RF), XGBoost, SVM, multi-layer perceptron, and the stacking ensemble model. The “initial_split” function was used to randomly partition the dataset into training and testing sets in a 7:3 ratio. We conducted five-fold cross-validation and applied Bayesian optimization for hyperparameter tuning, which proved effective in reducing overfitting. To gain a more comprehensive understanding of the performance of the 10 models, we also measured their accuracy, sensitivity, specificity, recall, and F1 score. The optimal model was selected by comparing the area under the receiver operating characteristic (ROC) curve (AUC). We also generated ROC curves, clinical decision curve analysis (DCA) curves, and calibration curves to visualize the performance of the models. Finally, the Shapley additive explanations (SHAP) algorithm was adopted to quantify the contribution of each feature to the predictions made by the optimal model. An analysis of two cases was simultaneously conducted so as to interpret the results generated by the model output. We also developed a web-based tool. The optimal model was to be deployed on a Shiny web page for public access.

For comprehensive assessment of performance of the ML model, we conducted a comparison of its predictive capability with that of the SOFA and SIRS scores, which are currently prevalent in mortality assessment among patients with SAE. We used ROC curves as a comparative tool to quantify the differences in performance of the ML models relative to SOFA and SIRS scores.

Results

Baseline characteristics

We included 6551 patients with SAE from the MIMIC-IV database, among whom 1112 were deceased. We enrolled 3392 patients with SAE from the eICU database (Supplementary Table 1). The flowchart of the case screening process is depicted in Figure 1. Patients’ baseline characteristics are presented in Table 1. Among included patients, the median age was 70 years (IQR, 58–81 years), and 3589 individuals (54.79%) were men. Hypertension was the most common comorbidity (2816 cases, 42.99%), followed by congestive heart failure (2129 cases, 32.50%). The results of differential analysis between the survival and death groups indicated that for age, SOFA score, SIRS score, GCS score, heart rate, SBP, DBP, mean blood pressure, respiratory rate, temperature, SpO₂, hemoglobin, WBC, anion gap, bicarbonate, BUN, calcium, chloride, creatinine, glucose, potassium, Abs basophils, Abs eosinophils, Abs lymphocytes, INR, PT, PTT, myocardial infarction, congestive heart failure, hypertension, septic shock, antibiotic treatment, and LOS were all P < 0.001, indicating statistically significant differences.

Figure 1.

Study flowchart. (a) Model construction workflow diagram and (b) case screening process flowchart. MIMIC-IV, Medical Information Mart for Intensive Care IV; eICU, eICU Collaborative Research Database; ICU, intensive care unit; GCS, Glasgow Coma Scale; SHAP, Shapley additive explanations; ML, machine learning.

Table 1.

Baseline characteristics of patients with SAE.

Variables	Total (N = 6551)	SAE		Statistic	P-value
Variables	Total (N = 6551)	Survival (n = 5439)	Death (n = 1112)	Statistic	P-value
Demographic variables
Sex, n (%)					0.024
Female		2,425 (37)	537 (8)
Male		3,014 (46)	575 (9)
Age (years), M (Q1, Q3)	70.00 (58.00, 81.00)	69.00 (57.00, 80.00)	75.00 (63.00, 84.00)	Z = 9.30	<0.001
Severity of illness
SOFA, M (Q1, Q3)	3.00 (2.00, 5.00)	3.00 (2.00, 4.00)	4.00 (2.00, 5.00)	Z = 8.35	<0.001
SIRS, mean (SD)	2.90 (0.87)	2.87 (0.88)	3.05 (0.82)	t = 6.74	<0.001
GCS, M (Q1, Q3)	12.00 (8.00, 14.00)	13.00 (9.00, 14.00)	7.00 (3.00, 11.00)	Z = −25.31	<0.001
Vital signs
Heart rate (beats/min), M (Q1, Q3)	71.00 (61.00, 83.00)	71.00 (61.00, 82.00)	74.00 (62.00, 87.00)	Z = 4.56	<0.001
SBP (mmHg), mean (SD)	87.58 (15.88)	88.13 (15.55)	84.85 (17.11)	t = −5.92	<0.001
DBP (mmHg), mean (SD)	43.99 (10.46)	44.37 (10.19)	42.16 (11.52)	t = −5.93	<0.001
MBP (mmHg), mean (SD)	55.81 (12.93)	56.28 (12.60)	53.47 (14.23)	t = −6.11	<0.001
Respiratory rate (beats/min), M (Q1, Q3)	13.00 (10.00, 15.00)	12.00 (10.00, 15.00)	14.00 (11.00, 16.00)	Z = 8.39	<0.001
Temperature (°C), M (Q1, Q3)	36.39 (35.94, 36.67)	36.44 (36.00, 36.71)	36.33 (35.68, 36.61)	Z = −6.88	<0.001
SpO₂ (%), M (Q1, Q3)	92.00 (90.00, 95.00)	93.00 (90.00, 95.00)	91.00 (87.00, 94.00)	Z = −8.60	<0.001
Laboratory tests
Hematocrit (%), mean (SD)	29.84 (6.57)	29.90 (6.50)	29.56 (6.90)	t = −1.52	0.128
Hemoglobin (g/dL), mean (SD)	9.83 (2.19)	9.88 (2.16)	9.61 (2.28)	t = −3.58	<0.001
Platelets (K/μL), M (Q1, Q3)	165.00 (102.00, 233.00)	165.00 (114.00, 232.00)	164.00 (101.00, 239.00)	Z = −1.74	0.083
WBC (K/μL), M (Q1, Q3)	10.10 (6.30, 13.90)	9.90 (7.00, 13.60)	10.95 (7.20, 15.90)	Z = 5.43	<0.001
Anion gap (mEq/L), M (Q1, Q3)	13.00 (11.00, 15.00)	12.00 (11.00,15.00)	14.00 (12.00, 17.00)	Z = 12.75	<0.001
Bicarbonate (mmol/L), mean (SD)	20.73 (5.16)	20.99 (4.99)	19.48 (5.75)	t = −8.15	<0.001
BUN (mg/dL), M (Q1, Q3)	21.00 (14.00, 34.00)	19.00 (13.00, 32.00)	28.00 (17.00, 47.00)	Z = 14.33	<0.001
Calcium (mg/dL), mean (SD)	7.91 (0.88)	7.93 (0.86)	7.81 (0.98)	Z = −3.87	<0.001
Chloride (mEq/L), M (Q1, Q3)	102.00 (98.00, 106.00)	103.00 (99.00, 106.00)	101.00 (97.00, 106.00)	Z = −6.05	<0.001
Creatinine (μmol/L), M (Q1, Q3)	1.00 (0.70, 1.40)	0.90 (0.70, 1.40)	1.20 (0.80, 2.00)	Z = 10.28	<0.001
Glucose, M (Q1, Q3)	102.00 (84.00, 123.00)	101.00 (87.00, 121.00)	106.00 (88.00, 131.75)	Z = 4.45	<0.001
Sodium (mEq/L), mean (SD)	136.92 (5.13)	137.01 (4.98)	136.49 (5.78)	t = −2.80	0.005
Potassium (mEq/L), mean (SD)	3.88 (0.59)	3.87 (0.57)	3.96 (0.70)	t = 3.86	<0.001
Abs basophils (K/μL), M (Q1, Q3)	0.04 (0.00, 1.59)	0.05 (0.00, 2.22)	0.02 (0.00, 1.33)	Z = −9.32	<0.001
Abs eosinophils (K/μL), M (Q1, Q3)	0.12 (0.00, 3.21)	0.15 (0.00, 5.51)	0.01 (0.00, 2.03)	Z = −9.44	<0.001
Abs lymphocytes (K/μL), M (Q1, Q3)	38.72 (1.16, 109.50)	42.00 (1.24, 113.27)	26.90 (0.83, 90.24)	Z = −6.31	<0.001
Abs monocytes (K/μL), M (Q1, Q3)	15.66 (0.69, 45.76)	16.60 (0.69, 45.32)	10.25 (0.68, 47.45)	Z = −0.87	0.382
Abs neutrophils (K/μL), M (Q1, Q3)	436.20 (11.13, 1046.19)	451.92 (11.11, 1035.60)	330.76 (11.21, 1108.56)	Z = 0.82	0.410
INR, M (Q1, Q3)	1.20 (1.10, 1.40)	1.20 (1.10, 1.40)	1.30 (1.10, 1.60)	Z = 10.63	<0.001
PT (s), M (Q1, Q3)	13.50 (12.20, 15.60)	13.40 (12.10, 15.20)	14.45 (12.60, 17.70)	Z = 10.79	<0.001
PTT (s), M (Q1, Q3)	28.70 (25.70, 33.20)	28.50 (25.60, 32.70)	30.50 (26.40, 36.50)	Z = 9.31	<0.001
Comorbidities, n (%)
Myocardial infarction		311 (10)	141 (5)		<0.001
Congestive heart failure		1,699 (26)	430 (7)		<0.001
Chronic pulmonary disease		1,462 (22)	330 (5)		0.057
Peripheral vascular disease		622 (9)	158 (2)		0.009
Diabetes		1,640 (25)	305 (5)		0.070
Hypertension		2,398 (37)	418 (7)		<0.001
Septic shock		1092 (17)	442 (7)		<0.001
Antibiotic treatment		4492 (69)	836 (13)		<0.001
Outcome
LOS (day), M (Q1, Q3)	3.88 (2.18, 7.60)	3.67 (2.10, 7.09)	5.26 (2.82, 10.02)	Z = 6.50	<0.001

SOFA, Sequential Organ Failure Assessment; SIRS, systemic inflammatory response syndrome; GCS, Glasgow Coma Scale; SBP, systolic blood pressure; DBP, diastolic blood pressure; MBP, mean blood pressure; SpO₂, arterial oxygen saturation; WBC, white blood cells; BUN, blood urea nitrogen; Abs, absolute; INR, international normalized ratio; PT, prothrombin time; PTT, partial thromboplastin time; LOS, length of intensive care unit stay; M, median; SD, standard deviation.

Developing and validating the model

We collected a total of 40 clinical features (Supplementary Figure 1). After LASSO regression screening, 15 features remained for model development (Supplementary Figure 1b): GCS score, age, SpO₂, BUN, INR, chloride, temperature, respiratory rate, PTT, anion gap, WBC, hypertension, diabetes, septic shock, and antibiotic treatment.

The cohort of 6551 patients was randomly divided into a training set comprising 4585 individuals (70%) and a testing set comprising 1966 individuals (30%) (Supplementary Table 2). In comparing the performance metrics, the stacking ensemble model exhibited the highest AUC (0.807, 95% confidence interval [CI]: 0.783–0.831) on the testing set. DT excelled in terms of accuracy and specificity. LR exhibited optimal sensitivity and recall rates. The F-value of the SVM model outperformed that of other models (Table 2). We plotted the ROC curves (Figure 2a), and the results demonstrated that the stacking ensemble model achieved the highest AUC. Figure 2b and Supplementary Figure 2 respectively illustrate the DCA curve and calibration curve of the models on the testing set, with different colors representing the different models. The DCA curve demonstrates that XGBoost exhibited favorable net benefit within the threshold probability range of 5% to 94%. The calibration curves indicate that XGBoost and LightGBM had the highest alignment with the 45-degree diagonal, followed by the stacking ensemble model. The higher the degree of coincidence, the closer the correspondence between the model predictions and the actual observations.

Table 2.

Performance metrics of 10 machine learning models.

ID	Model	Accuracy	Sensitivity	Specificity	Recall	F-score	AUC (95% CI)
1	LR	0.721	0.734	0.718	0.734	0.472	0.797 (0.773–0.822)
2	DT	0.714	0.647	0.728	0.647	0.435	0.734 (0.704–0.764)
3	ENet	0.691	0.751	0.679	0.751	0.453	0.796 (0.772–0.821)
4	KNN	0.762	0.524	0.811	0.524	0.428	0.744 (0.715–0.773)
5	LightGBM	0.721	0.728	0.720	0.728	0.470	0.802 (0.777–0.826)
6	RF	0.748	0.680	0.762	0.680	0.478	0.801 (0.776–0.826)
7	XGBoost	0.732	0.716	0.735	0.716	0.476	0.800 (0.776–0.825)
8	SVM	0.722	0.743	0.718	0.743	0.476	0.796 (0.771–0.820)
9	MLP	0.727	0.701	0.732	0.701	0.466	0.796 (0.771–0.820)
10	Stacking ensemble model	0.744	0.713	0.751	0.713	0.486	0.807 (0.783–0.831)

LR, logistic regression; DT, decision tree; ENet, elastic net; KNN, K-nearest neighbor; LightGBM, light gradient boosting machine; RF, random forest; XGBoost, extreme gradient boosting; SVM, support vector machine; MLP, multi-layer perceptron.

Figure 2.

Line chart. (a) Discriminative ability of the 10 models compared using receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) and (b) decision curve analysis (DCA) curves for the 10 models. LR, logistic regression; DT, decision tree; ENet, elastic net; KNN, K-nearest neighbor; LightGBM, light gradient boosting machine; RF, random forest; XGBoost, extreme gradient boosting; SVM, support vector machine; MLP, multi-layer perceptron.

In the external validation set, the stacking ensemble model achieved an AUC of 0.671 (95% CI, 0.647–0.695), with accuracy, sensitivity, and recall rates of 0.476, 0.808, and 0.808, respectively. The ROC curve is depicted in Supplementary Figure 3. This indicates that the predictive model is well-suited for external validation data (Supplementary Table 3).

Model interpretability

To delve further into mortality prediction using the stacking ensemble model, we used the SHAP algorithm to analyze the model’s output process. The feature importance of the stacking ensemble model is illustrated in Figure 3a, where GCS and age emerge as the two most crucial variables. We used SHAP summary plots to illustrate the overall positive and negative impacts of continuous and categorical variables on the output of the stacking ensemble model. Among the categorical variables, septic shock had the greatest value in the model (Figure 3b). Among continuous variables, GCS had the greatest value in the model (Figure 3c). Figure 4 depicts a univariate distribution plot, which indicates that with decreasing GCS score and increasing age, the importance of these factors in predicting the model’s outcomes also increases.

Figure 3.

SHAP plot. (a) SHAP feature importance. The greater the variable importance, the longer the corresponding bar. (b) Boxplot: importance of categorical variables and (c) Beeswarm plot: swarm plot for continuous variables. GCS, Glasgow Coma Scale; SpO₂, arterial oxygen saturation; WBC, white blood cell; BUN, blood urea nitrogen; INR, international normalized ratio; PTT, partial thromboplastin time; SHAP, Shapley additive explanations.

Figure 4.

Univariate distribution plot. (a) Glasgow Coma Scale (GCS) and (b) Age. Yellow dots represent deceased patients; purple dots represent surviving patients. Red line depicts overall trend in the Shapley additive explanations (SHAP) values of the variable. Blue histogram illustrates the sample count corresponding to different SHAP values.

We selected two samples and used the SHAP algorithm to analyze the prediction results of the stacking ensemble model. Figure 5 depicts SHAP force plots for the samples, in which different colors denote varying contributions to the predicted outcome, which we defined as death. Figure 5a depicts deceased patients, where septic shock = 1 and GCS = 9 showed a predominantly positive role in the outcome of death, and age = 53 years exhibited a primary negative impact on the outcome of death. The model’s output value was 0.18, which is below the baseline value of 0.17. In this situation, factors such as septic shock lead to patient mortality whereas factors such as age 53 years lead toward survival. The length of the red bars exceeded that of the blue bars, predicting the outcome as death, which is in alignment with the actual outcome for this patient. Figure 5b illustrates surviving patients, where age = 96 years and GCS = 10 showed a positive effect on the outcome of death, and septic shock = 0 and anion gap = 10 exhibited a negative impact on the outcome of death. The model’s output value was 0.12, which is below the baseline value of 0.17. In this scenario, advanced age and GCS = 10, among other factors, contribute to patient mortality whereas the absence of septic shock and anion gap = 10, among other factors, lead toward patient survival. The length of the blue bars exceeded that of the red bars, predicting the outcome as survival, which is consistent with the actual outcome for this patient. Figure 5c and 5d presents waterfall plots for the abovementioned cases as another depiction of the SHAP force plots. Figure 5c represents deceased patients, and Figure 5d represents surviving patients. From the graph, it can be observed that GCS score and age are the two most crucial variables influencing the model output.

Figure 5.

Bar chart. (a) Force plots for deceased patients. (b) Force plots for surviving patients. For force plots, red denotes a positive effect on the model outcome, and blue signifies a negative effect on the model outcome. The longer the bar, the greater the importance of the feature. (c) Waterfall plots for deceased patients and (d) Waterfall plots for surviving patients. For waterfall plots, yellow indicates a positive impact on the outcome of death, and purple signifies a negative impact on the outcome of death, with the length representing the contribution value. GCS, Glasgow Coma Scale; SpO₂, arterial oxygen saturation; WBC, white blood cell; BUN, blood urea nitrogen; INR, international normalized ratio; PTT, partial thromboplastin time; SHAP, Shapley additive explanations.

The results of a comparison between the optimal model and SOFA score, as well as the SIRS score, revealed that the AUC for the SOFA score is 0.577 (95% CI, 0.558–0.595), and the AUC for the SIRS score is 0.556 (95% CI, 0.538–0.574) (Supplementary Figure 4). These results indicated that the optimal model outperforms the SOFA score and SIRS score in predicting in-hospital mortality among patients with SAE.

Application of the optimal model

We additionally created a website (https://sic1.shinyapps.io/SSAAEE/) and deployed the stacking ensemble model on the Shiny web platform. This provides a convenient platform for peer communication and utilization to facilitate the assessment of mortality risk in patients with SAE.

Discussion

There remains a paucity of research on predicting the in-hospital mortality risk among patients with SAE in the ICU. Based on 18 clinical features within the first 24 hours of ICU admission, we developed and validated 10 ML models to predict the in-hospital mortality risk among patients with SAE. Compared with other models, the stacked ensemble model exhibited subtle yet remarkable performance superiority.

As an amalgamation of computer science and statistics, ML processes data in a semi-automated manner and creates intricate models within learning frameworks, thereby yielding accurate diagnostic algorithms and personalized patient treatments.²⁰ As an excellent algorithm, the stacking ensemble model has been applied in the development of disease prediction models. Huangbo and colleagues successfully constructed a stacking ensemble model that could predict the 6-month mortality rate in patients with ischemic stroke.²¹ Gupta et al. successfully used a stacking ensemble ML approach to predict the risk of cardiac complications following COVID-19 infection.²² Peng et al. explored predictors and developed ML models for predicting 30-day mortality in patients with SAE, using the MIMIC-IV database. Their evaluation, including metrics such as AUC, accuracy, and calibration performance, adds valuable insights to the understanding of prognostic modeling in critical care settings.²³ In the context of our study, we used 10 ML models based on the MIMIC-IV database to predict in-hospital mortality risk among patients with SAE, with the stacked ensemble model emerging as the most effective.

To authentically demonstrate the generalization performance of our model across diverse datasets, we conducted external validation using the eICU database. However, the stacking ensemble model had an AUC of 0.671 for the external validation set, significantly lower than the 0.807 observed for the internal validation set. The reasons for this disparity may be owing to various factors. First, the internal and external datasets originated from distinct distributions, and the patient characteristics varied among different medical centers. This disparity may lead to a decline in model performance when using external data. Additionally, differences in sample size could impact the model’s performance because smaller external datasets may offer relatively limited information.

We found that the two most significant variables influencing the optimal model output were GCS and age. The GCS is commonly used to assess the severity of injury and illness, helping with classification and intervention and enabling timely detection of changes in consciousness.²⁴ Despite the fluctuating accuracy of the GCS, this scale remains a commonly used tool for distinguishing survivors and non-survivors among trauma patients.²⁵ Research indicates that the mortality rate in brain disorders is associated with GCS scores; the mortality rate is 16% with a GCS score of 15, 20% with scores ranging from 13 to 14, 50% with scores between 9 and 12, and 63% with scores from 3 to 8.²⁶ Our model also demonstrated that a smaller GCS has a more significant impact on the model output. In addition, the impact on the model output was more significant with increasing age. Advanced age may be associated with inflammatory aging in older people, which has clear involvement in cardiovascular disease and metabolic abnormalities in sarcopenia.²⁷ Opal and colleagues observed that older patients with sepsis had a significantly increased mortality rate than their younger counterparts. This elevation in mortality is possibly attributable to advanced age inducing innate immune response abnormalities, thereby influencing sepsis.²⁸ The assessment of other continuous variables aligns with medical knowledge, where a greater deviation from the normal range corresponds to a more substantial impact on the model output.

To the best of our knowledge, this study represents the first attempt at using a stacked ensemble model to predict the mortality risk in SAE. The stacked ensemble model, distinguished for its ability to amalgamate advantages from multiple foundational models, exhibits heightened generalization capabilities when using novel data. Particularly in clinical scenarios when predicting high-risk patients with SAE across multicenter ICUs, this model’s robustness when using new data is notable. Furthermore, relative to singular models, the stacked ensemble model demonstrates enhanced resilience when dealing with noise or outliers owing to its capacity to harmonize predictions from individual sub-models. In contrast to certain deep learning models, the stacked ensemble model affords a higher degree of interpretability. In our study, we conducted external validation of the predictive models using the eICU database, with results consistently showcasing robust performance of the predictive models. We also developed an online tool to facilitate convenient use of our predictive model with clinical practitioners, which can provide valuable insights for clinical decision-making.

This study has several limitations. Prospective data for validation of our predictive models and corresponding controlled studies to verify the improvement in clinical outcomes are lacking. Addressing these limitations will be a focus of our future research efforts.

Conclusion

In this study, we successfully developed a stacked ensemble model characterized by improved generalization capabilities using novel data, specifically designed to predict mortality risk in patients diagnosed with SAE. This innovative clinical support model can play a pivotal role in augmenting physician awareness, providing valuable insights for the identification of high-risk patients with SAE. The developed model can empower healthcare professionals to proactively address the needs of these patients by facilitating timely interventions, thereby contributing to enhanced clinical outcomes and patient prognosis.

Supplemental Material

sj-pdf-1-imr-10.1177_03000605241239013 - Supplemental material for Enhancing predictions with a stacking ensemble model for ICU mortality risk in patients with sepsis-associated encephalopathy

Supplemental material, sj-pdf-1-imr-10.1177_03000605241239013 for Enhancing predictions with a stacking ensemble model for ICU mortality risk in patients with sepsis-associated encephalopathy by Xuhui Liu, Hao Niu and Jiahua Peng in Journal of International Medical Research

Supplemental Material

sj-pdf-2-imr-10.1177_03000605241239013 - Supplemental material for Enhancing predictions with a stacking ensemble model for ICU mortality risk in patients with sepsis-associated encephalopathy

Supplemental material, sj-pdf-2-imr-10.1177_03000605241239013 for Enhancing predictions with a stacking ensemble model for ICU mortality risk in patients with sepsis-associated encephalopathy by Xuhui Liu, Hao Niu and Jiahua Peng in Journal of International Medical Research

Footnotes

Authors’ contributions

H.N. was responsible for data collection. X.L. was in charge of statistical analysis, model construction, and manuscript writing. J.P. critically revised the article. All authors read and approved the final manuscript.

Data availability statement

The datasets supporting the conclusions of this article are available in the MIMIC-IV (version 1.0) repository at (https://physionet.org/content/mimiciv/1.0/). The eICU database is openly accessible at (). The code used for data extraction and analysis can be requested from the corresponding author.

Declaration of conflicting interest

The authors declare that there is no conflict of interest.

Funding

This research received no specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

ORCID iD

Xuhui Liu

Supplemental material

Supplemental material for this article is available online.

References

Zhang

Wang

, et al. Epidemiological features and risk factors of sepsis-associated encephalopathy in intensive care unit patients: 2008–2011. Chin Med J (Engl) 2012; 125: 828–831.

Golzari

Mahmoodpoor

Sepsis-associated encephalopathy versus sepsis-induced encephalopathy. Lancet Neurol 2014; 13: 967–968.

Zampieri

Park

Machado

, et al. Sepsis-associated encephalopathy: not just delirium. Clinics (Sao Paulo) 2011; 66: 1825–1831.

Iacobone

Bailly-Salin

Polito

, et al. Sepsis-associated encephalopathy and its differential diagnosis. Crit Care Med 2009; 37: S331–S336.

Young

Bolton

Austin

, et al. The encephalopathy associated with septic illness. Clin Invest Med 1990; 13: 297–304.

Gofton

Young

GB.

Sepsis-associated encephalopathy. Nat Rev Neurol 2012; 8: 557–566.

Ebersoldt

Sharshar

Annane

Sepsis-associated delirium. Intensive Care Med 2007; 33: 941–950.

Fukushima

Kobayashi

Kawano

, et al. Performance of Quick Sequential (Sepsis Related) and Sequential (Sepsis Related) Organ Failure Assessment to Predict Mortality in Patients with Acute Pyelonephritis Associated with Upper Urinary Tract Calculi. J Urol 2018; 199: 1526–1533.

Ranzani

Prina

Menéndez

, et al. New Sepsis Definition (Sepsis-3) and Community-acquired Pneumonia Mortality. A Validation and Clinical Decision-Making Study. Am J Respir Crit Care Med 2017; 196: 1287–1297.

10.

Demirjian

Chertow

Zhang

, et al. Model to predict mortality in critically ill adults with acute kidney injury. Clin J Am Soc Nephrol 2011; 6: 2114–2120. DOI: 10.2215/CJN.02900311.

11.

Kang

Zhou

, et al. Prediction and risk assessment of sepsis-associated encephalopathy in ICU based on interpretable machine learning. Sci Rep 2022; 12: 22621.

12.

Deng

Chen

, et al. Machine learning for early prediction of sepsis-associated acute brain injury. Front Med (Lausanne) 2022; 9: 962027.

13.

Zhao

Wang

, et al. Mechanical Learning for Prediction of Sepsis-Associated Encephalopathy. Front Comput Neurosci 2021; 15: 739265.

14.

Bzdok

Krzywinski

Altman

Machine learning: A primer. Nat Methods 2017; 14: 1119–1120.

15.

Moons

Altman

Reitsma

, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann Intern Med 2015; 162: W1–W73. DOI: 10.7326/M14-0698.

16.

Mazeraud

Righy

Bouchereau

, et al. Septic-Associated Encephalopathy: a Comprehensive Review. Neurotherapeutics 2020; 17: 392–403.

17.

Singer

Deutschman

Seymour

, et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA 2016; 315: 801–810.

18.

Guyon

Elisseeff

An Introduction to Variable and Feature Selection. J Machine Learning Research 2003; 3: 1157–1182.

19.

Connor

Hollensen

Krigolson

, et al. A biological mechanism for Bayesian feature selection: Weight decay and raising the LASSO. Neural Netw 2015; 67: 121–130.

20.

Handelman

Kok

Chandra

, et al. eDoctor: machine learning and the future of medicine. J Intern Med 2018; 284: 603–619.

21.

Hwangbo

Kang

Kwon

, et al. Stacking ensemble learning model to predict 6-month mortality in ischemic stroke patients. Sci Rep 2022; 12: 17389.

22.

Gupta

Jain

Singh

Stacking Ensemble-Based Intelligent Machine Learning Model for Predicting Post-COVID-19 Complications. New Gener Comput 2022; 40: 987–1007.

23.

Peng

Yang

, et al. Machine learning approach for the prediction of 30-day mortality in patients with sepsis-associated encephalopathy. BMC Med Res Methodol 2022; 22: 183. Epub ahead of print. DOI: 10.1186/s12874-022-01664-z.

24.

Rimel

Jane

Edlich

RF.

An injury severity scale for comprehensive management of central nervous system trauma. JACEP 1979; 8: 64–67.

25.

Gill

Windemuth

Steele

, et al. A comparison of the Glasgow Coma Scale score to simplified alternative scores for the prediction of traumatic brain injury outcomes. Ann Emerg Med 2005; 45: 37–42.

26.

Eidelman

Putterman

, et al. The spectrum of septic encephalopathy. Definitions, etiologies, and mortalities. JAMA 1996; 275: 470–473.

27.

Vasto

Candore

Balistreri

, et al. Inflammatory networks in ageing, age-related diseases and longevity. Mech Ageing Dev 2007; 128: 83–91.

28.

Opal

Girard

Ely

The Immunopathogenesis of Sepsis in Elderly Patients. Clin Infect Dis 2005; 41: S504–S512.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

3.02 MB

0.67 MB