Development and validation of an explainable machine learning model for predicting prognosis in sepsis patients with a history of cancer who were admitted to the intensive care unit

Abstract

Background

Sepsis is the leading cause of mortality in critically ill cancer patients; however, traditional prognostic models fail to capture the complexity of their immune and physiological interactions.

Methods

This retrospective study analyzed electronic health records from the Medical Information Mart for Intensive Care IV database, including the records of patients with sepsis who had a documented history of cancer and were admitted to the intensive care unit. A two-step feature selection approach, combining least absolute shrinkage and selection operator regression and recursive feature elimination, was used to identify key prognostic variables. Eight machine learning algorithms, such as random forest and extreme gradient boosting, were trained and evaluated using five-fold cross-validation. Model performance was assessed using the area under the receiver operating characteristic curve value, Brier scores, sensitivity, and specificity. SHapley Additive exPlanations, Partial Dependence Plots, and break down algorithms were applied to enhance model interpretability.

Results

The final cohort included 3364 patients admitted to the intensive care unit. Nonsurvivors had significantly higher illness severity scores (Acute Physiology Score III and Sequential Organ Failure Assessment) than survivors (p < 0.001). Among the tested models, the random forest model demonstrated superior performance, achieving the highest area under the receiver operating characteristic curve value (0.78; 95% confidence interval: 0.76–0.80) and the lowest Brier score (0.15), indicating strong predictive accuracy.

Conclusions

This study developed machine learning models for predicting in-hospital mortality in sepsis patients with a history of cancer, leveraging the Medical Information Mart for Intensive Care IV database for comprehensive risk assessment.

Keywords

Machine learning sepsis cancer prediction model supervised learning

Introduction

Sepsis, a life-threatening organ dysfunction caused by a dysregulated host response to infection, is associated with high morbidity and mortality and is one of the most common admission diagnoses in critically ill cancer patients admitted to intensive care units (ICUs).^1–3 Immune suppression from underlying malignancies and cancer-related treatments often predisposes cancer patients to sepsis.⁴ Emerging evidence indicates that immune dysfunction, characterized by hyperinflammation and immunosuppression, drives sepsis progression, causing early organ damage. Moreover, in patients with severe immune dysfunction, fatal complications may occur in the later stages.⁵ Patients with a history of cancer are estimated to have a 10-fold higher risk of developing sepsis than those without a history of cancer.⁶ Recent observational studies under ICU settings have revealed that 15%–20% of patients with sepsis have underlying hematologic or solid malignancies.⁷ Given the high mortality rate in cancer patients with sepsis, prognosis assessment requires refined criteria that incorporate not only robust clinical practices but also data-driven predictive models for early identification, personalized diagnosis, and timely multidisciplinary interventions to improve survival outcomes.⁸

Traditional prognostic assessment methods based on single indices or scoring systems (e.g. Sequential Organ Failure Assessment (SOFA) score) often do not fully reflect the multifactorial nature of sepsis in cancer patients, where intricate interactions between physiological and immune parameters contribute to disease progression. The lack of multi-feature integration further limits the development of comprehensive and precise prognostic models tailored to this patient population. Moreover, empirical evidence regarding mortality outcomes and prognostic factors in critically ill cancer patients with sepsis remains scarce, particularly following the adoption of the Sepsis-3 criteria.⁹ Although several prognostic models have recently been developed for sepsis patients with cancer, their utility for early risk stratification remains limited.^10,11 This is primarily because these models incorporate variables collected after admission or during the course of treatment—such as the duration of antibiotic therapy and vasopressor use—which are not typically available at the time of ICU admission. Moreover, these studies rely on traditional statistical methods such as logistic regression, which may not fully capture the complexity and heterogeneity of the patient population, thereby limiting their predictive performance.

To overcome the limitations of traditional methods, recent studies have shifted toward machine learning (ML) approaches.^12–15 In critical care medicine, where disease progression is influenced by multiple interacting factors, ML has been widely applied in diagnostic and prognostic modeling.^16–18 However, despite the existence of multiple scoring models for predicting sepsis severity and mortality in geriatric populations, studies developing prognostic models specifically for in-hospital mortality in cancer patients with sepsis remain limited.^19–22 The Medical Information Mart for Intensive Care (MIMIC) database is a vast, openly accessible repository containing de-identified health data from thousands of patients treated in ICUs at Beth Israel Deaconess Medical Center from 2001 to 2019.²³ The extensive clinical data from the MIMIC database provide robust and reliable support for predicting the prognostic risks in patients with sepsis complicated by tumors. Utilizing the expansive clinical data from the MIMIC database, this study aimed to identify sepsis patients with concomitant tumors and elucidate the risk factors associated with in-hospital mortality, thereby addressing a critical research gap in the prognostic modeling for this complex patient cohort.

Methods

Research cohorts

This retrospective study analyzed the electronic health record (EHR) data from the MIMIC-IV database, which comprises comprehensive clinical data of patients admitted to the ICU. The inclusion criteria were as follows: (a) patients aged ≥18 years admitted to the ICU with sepsis as a primary or secondary diagnosis, defined according to the Sepsis-3 criteria; (b) those with a documented cancer diagnosis (either active or in remission);³ and (c) patients with complete clinical data, including physiological and laboratory measurements within the first 24 h post-admission. The cohort was randomly stratified into training and validation sets (7:3 ratio), with a balanced distribution of clinical characteristics and outcomes between the two groups. Approval to use data from the MIMIC-IV database was obtained (certification no.: 48533840). As the data were de-identified, the need for informed consent was waived. The methods in this study were conducted in accordance with the Helsinki Declaration of 1975, as revised in 2024. The reporting of this study conforms to the Strengthening the Reporting of Observational studies in Epidemiology (STROBE) guidelines.²⁴

Clinical variables

The primary outcome measure was ICU mortality. Clinical variables were categorized as follows: (a) demographics: age, sex, height, and weight (kg); (b) vital signs: heart rate (beats/min), respiratory rate (breaths/min), temperature (°C), systolic blood pressure (SBP; mmHg), diastolic blood pressure (DBP; mmHg), and mean arterial pressure (mmHg); (c) laboratory parameters: pulse oximetric oxygen saturation (SpO₂; %), hematocrit (%), hemoglobin (g/dL), platelet count (×10⁹/L), white blood cell count (×10⁹/L), albumin (g/dL), anion gap (mmol/L), bicarbonate (mmol/L), blood urea nitrogen (BUN; mg/dL), calcium (mg/dL), chloride (mmol/L), creatinine (mg/dL), glucose (mg/dL), sodium (mmol/L), potassium (mmol/L), international normalized ratio, prothrombin time (s), partial thromboplastin time (s), alanine aminotransferase (ALT; U/L), aspartate aminotransferase (U/L), alkaline phosphatase (ALP; U/L), and total bilirubin (mg/dL); (d) scoring systems: Glasgow Coma Scale (GCS), SOFA, and Acute Physiology Score III (APSIII); (e) comorbidities: myocardial infarction, congestive heart failure, peripheral vascular disease, cerebrovascular diseases, dementia, chronic obstructive pulmonary disease (COPD), rheumatic disease, peptic ulcer disease, diabetes, paraplegia, renal disease, and acquired immunodeficiency syndrome (AIDS). Variables with >25% missing values were excluded. For the remaining variables, multiple imputation was applied using predictive mean matching.

Development of ML models

To develop ML models for predicting the prognosis of ICU-admitted sepsis patients with cancer, the study followed a structured workflow encompassing feature selection, model training, hyperparameter optimization, and performance evaluation. During variable selection, a two-step approach was used to identify the most relevant predictors. First, least absolute shrinkage and selection operator (LASSO) regression was applied to the training dataset to perform preliminary feature selection by shrinking the less important coefficients to zero. Then, recursive feature elimination (RFE) was applied using the features retained by LASSO to identify the optimal combination of variables. RFE iteratively eliminated the less important features based on model performance, ensuring a parsimonious feature set. Model training and hyperparameter optimization were performed using the training set. Five-fold cross-validation was employed to divide the training set into five subsets, with each subset serving as a validation fold once, while the remaining four subsets were used for training. During this process, hyperparameters were optimized using grid search to identify the configuration that maximized model performance. The study implemented a diverse set of eight ML algorithms: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naive Bayes, k-nearest neighbors (KNN), decision tree (DT), random forest (RF), XGBoost, and support vector machine (SVM). Model evaluation was conducted on the validation set using metrics such as the area under the receiver operating characteristic curve (AUC; to assess discrimination); Brier scores (to evaluate calibration); and sensitivity, specificity, positive predictive value, and negative predictive value (to determine clinical utility).

Interpretability of ML models

To enhance ML model interpretability and understand the impact of each feature on predictions, interpretability methods were applied, including SHapley Additive exPlanations (SHAP), Partial Dependence Plots (PDPs), and break down algorithm. SHAP values were used to quantify the contribution of each feature to the model’s predictions, offering global and local interpretability. SHAP summary plots highlighted the most influential features across the dataset, while SHAP dependence plots provided insights into the relationship between specific features and the predicted outcomes. PDPs were employed to visualize the marginal effect of a feature on the predicted outcome while holding other features constant, allowing for a global understanding of nonlinear interactions. Break down algorithm was applied to analyze individual predictions by approximating the behavior of complex models locally with simpler interpretable models, such as linear regression. This provided a detailed explanation of the factors influencing predictions for individual patients, which could be particularly useful for clinical decision-making. Together, these interpretability techniques ensured that the models were not only accurate but also transparent and clinically meaningful.

Statistical analysis

Continuous data were presented as medians with interquartile ranges (IQRs), while categorical variables were reported as counts (percentages). Differences in clinical variables obtained from the EHR were assessed using the Wilcoxon rank-sum test for continuous variables and either the chi-square test or Fisher’s exact test for categorical variables. All analyses were conducted using R software (version 4.2.0).

Results

Research cohorts

The final study cohort comprised 3364 patients; their baseline characteristics are presented in Table 1. The inclusion flowchart is shown in Figure 1. Nonsurvivors were significantly older than survivors (median age: 70 vs. 68 years; p = 0.010) and demonstrated greater illness severity, as evidenced by higher APSIII and SOFA scores (p < 0.001). Compared with survivors, nonsurvivors exhibited higher heart rates, lower blood pressure levels, and more severe metabolic derangements and organ dysfunction, indicated by elevated BUN levels and decreased bicarbonate and albumin levels (all p < 0.001). The prevalence of COPD was significantly higher among nonsurvivors (31% vs. 27%; p = 0.011).

Table 1.

Baseline characteristics of sepsis patients with cancer.

Characteristic	Overall, N = 3364	Alive, n = 2555	Death, n = 809	p
Demographic information
Age (years)	69 (60, 77)	68 (60, 77)	70 (61, 79)	0.010
Sex				0.5
Female	1246 (37%)	938 (37%)	308 (38%)
Male	2118 (63%)	1617 (63%)	501 (62%)
Height (cm)	170 (163, 178)	170 (163, 178)	169 (160, 175)	0.007
Weight (kg)	77 (65, 92)	78 (66, 93)	74 (63, 88)	<0.001
Vital signs
Heart rate (beats/min)	90 (79, 102)	88 (77, 99)	97 (84, 108)	<0.001
Respiratory rate (breaths/min)	20.1 (17.2, 23.1)	19.2 (17.1, 22.2)	21.2 (18.2, 24.3)	<0.001
Temperature (°C)	37.13 (37.03, 37.24)	37.14 (37.04, 37.24)	37.11 (37.00, 37.23)	<0.001
SBP (mmHg)	111 (103, 122)	112 (104, 123)	107 (100, 117)	<0.001
DBP (mmHg)	61 (55, 67)	61 (55, 67)	60 (54, 67)	0.2
MBP (mmHg)	74 (69, 81)	75 (69, 82)	73 (67, 79)	<0.001
Laboratory parameters
SpO₂ (%)	97.11 (95.28, 98.24)	97.15 (96.02, 98.26)	96.28 (95.08, 98.18)	<0.001
Hematocrit (%)	32.0 (28.1, 36.2)	32.1 (28.1, 36.3)	31.2 (27.2, 35.3)	0.003
Hemoglobin (g/dL)	10.27 (9.17, 12.14)	11.00 (9.20, 12.17)	10.18 (9.09, 12.03)	<0.001
Platelet count (×10⁹/L)	188 (119, 281)	192 (127, 279)	171 (88, 294)	<0.001
WBC count (×10⁹/L)	13 (8, 19)	13 (8, 19)	13 (8, 20)	0.10
Albumin (g/dL)	3.13 (2.29, 3.26)	3.14 (3.01, 3.27)	3.08 (2.21, 3.22)	<0.001
Anion gap (mmol/L)	16.2 (14.1, 19.2)	16.1 (14.0, 19.0)	17.3 (15.0, 21.2)	<0.001
Bicarbonate (mmol/L)	21.3 (18.2, 24.2)	22.0 (19.0, 24.2)	20.2 (16.2, 24.0)	<0.001
BUN (mg/dL)	25 (17, 41)	23 (16, 37)	32 (20, 50)	<0.001
Calcium (mg/dL)	8.30 (8.12, 9.19)	9.00 (8.12, 9.19)	8.29 (8.12, 9.22)	0.6
Chloride (mmol/L)	105 (101, 109)	105 (102, 109)	104 (100, 109)	<0.001
Creatinine (mg/dL)	1.23 (1.11, 2.16)	1.22 (1.10, 2.13)	1.27 (1.13, 2.24)	<0.001
Sodium (mmol/L)	136.2 (133.1, 139.2)	136.3 (133.3, 139.2)	136.0 (132.1, 139.2)	0.001
Potassium (mmol/L)	4.14 (4.02, 4.27)	4.14 (4.02, 4.26)	4.15 (4.03, 4.30)	0.006
Glucose (mg/dL)	133 (110, 165)	133 (112, 164)	130 (104, 173)	0.014
INR	1.26 (1.13, 2.15)	1.25 (1.12, 2.13)	2.03 (1.14, 2.21)	<0.001
PT (s)	15 (13, 19)	15 (13, 18)	16 (14, 21)	<0.001
PTT (s)	33 (29, 43)	33 (28, 42)	36 (29, 50)	<0.001
ALT (U/L)	31 (17, 75)	30 (17, 73)	33 (17, 85)	0.068
ALP (U/L)	105 (71, 193)	100 (69, 174)	122 (79, 231)	<0.001
AST (U/L)	46 (25, 120)	44 (25, 104)	55 (27, 151)	<0.001
Total bilirubin (mg/dL)	1.16 (0.26, 2.06)	1.15 (0.26, 2.03)	1.16 (0.26, 2.14)	0.12
Scoring systems
GCS score	14 (12, 15)	14 (13, 15)	14 (12, 15)	<0.001
SOFA score	5.0 (4.0, 8.0)	5.0 (3.0, 7.0)	7.0 (5.0, 10.0)	<0.001
APSIII score	50 (38, 64)	47 (36, 58)	63 (49, 80)	<0.001
Comorbidity
Myocardial infarction				0.3
No	2903 (86%)	2214 (87%)	689 (85%)
Yes	461 (14%)	341 (13%)	120 (15%)
Congestive heart failure				0.4
No	2543 (76%)	1941 (76%)	602 (74%)
Yes	821 (24%)	614 (24%)	207 (26%)
Peripheral vascular disease				0.4
No	3117 (93%)	2373 (93%)	744 (92%)
Yes	247 (7.3%)	182 (7.1%)	65 (8.0%)
Cerebrovascular diseases				0.086
No	3069 (91%)	2343 (92%)	726 (90%)
Yes	295 (8.8%)	212 (8.3%)	83 (10%)
Dementia				0.4
No	3289 (98%)	2495 (98%)	794 (98%)
Yes	75 (2.2%)	60 (2.3%)	15 (1.9%)
COPD				0.011
No	2426 (72%)	1871 (73%)	555 (69%)
Yes	938 (28%)	684 (27%)	254 (31%)
Rheumatic diseases				0.5
No	3253 (97%)	2474 (97%)	779 (96%)
Yes	111 (3.3%)	81 (3.2%)	30 (3.7%)
Peptic ulcer disease				0.4
No	3238 (96%)	2455 (96%)	783 (97%)
Yes	126 (3.7%)	100 (3.9%)	26 (3.2%)
Diabetes				0.036
No	2440 (73%)	1830 (72%)	610 (75%)
Yes	924 (27%)	725 (28%)	199 (25%)
Paraplegia				0.9
No	3251 (97%)	2470 (97%)	781 (97%)
Yes	113 (3.4%)	85 (3.3%)	28 (3.5%)
Renal disease				>0.9
No	2599 (77%)	1974 (77%)	625 (77%)
Yes	765 (23%)	581 (23%)	184 (23%)
AIDS				0.3
No	3329 (99%)	2531 (99%)	798 (99%)
Yes	35 (1.0%)	24 (0.9%)	11 (1.4%)
Duration of ICU stay (days)	3.0 (2.0, 5.0)	3.0 (2.0, 5.0)	4.0 (2.0, 7.0)	<0.001

SBP: systolic blood pressure; DBP: diastolic blood pressure; MBP: mean blood pressure; SpO₂: pulse oximetric oxygen saturation; WBC: white blood cell; INR: international normalized ratio; PT: prothrombin time; PTT: partial thromboplastin time; ALT: alanine aminotransaminase; ALP: alkaline phosphatase; AST; aspartate transaminase; GCS: Glasgow Coma Scale, SOFA: Sequential Organ Failure Assessment; APSIII: Acute Physiology Score III; COPD: chronic obstructive pulmonary disease; AIDS: acquired immunodeficiency syndrome; ICU: intensive care unit.

Figure 1.

Cohort flowchart.

Development of ML models

A two-step process was used for variable selection, combining LASSO and RFE to identify the most predictive and relevant features. First, LASSO was applied using the lambda.1se criterion (Figure 2(a) and (b)), which retained 14 variables, including clinical scores (APSIII and SOFA), comorbidities (cerebrovascular diseases, COPD, rheumatic diseases, paraplegia, and AIDS), vital signs (heart rate, SBP, respiratory rate, and SpO₂), and laboratory or demographic features (weight and ALP). Subsequently, RFE was applied to further refine the best variable set, identifying an optimal combination of the following 11 features (Figure 2(c)): APSIII, SOFA, heart rate, respiratory rate, weight, SpO₂, ALP, SBP, cerebrovascular diseases, AIDS, and COPD. These 11 variables were used for the development of models. The training and validation sets (Table 2) showed no significant differences in baseline characteristics. Among the models tested (Table 3), the RF model demonstrated superior performance, achieving the highest AUC (AUC = 0.78, 95% confidence interval (CI): 0.76–0.80), lowest Brier score (0.15), and highest recall (0.98, 95% CI: 0.98–1.00), indicating strong discriminative ability and sensitivity. LDA and SVM also exhibited strong predictive performance, with AUC values of 0.77 (95% CI: 0.76–0.78) and 0.76 (95% CI: 0.75–0.78), respectively. The RF model’s confusion matrix for the validation set is shown in Figure 3(a); the RF model maintained strong predictive discrimination with a validation AUC of 0.75 (Figure 3(b)). The results suggest that RF provides the most robust prognostic model for sepsis patients with cancer.

Figure 2.

Results of variable selection.

Table 2.

Characteristics of the training and validation sets.

Characteristic	Training set, n = 2355	Validation set, n = 1009	p
Weight (kg)	77 (65, 92)	78 (65, 92)	0.6
Heart rate (beats/min)	90 (79, 101)	90 (78, 102)	0.6
Respiratory rate (breaths/min)	20.1 (17.2, 23.1)	19.3 (17.1, 23.1)	0.085
SBP (mmHg)	111 (103, 122)	111 (103, 123)	0.7
SpO₂ (%)	97.12 (95.25, 98.23)	97.14 (95.28, 98.24)	0.2
ALP (U/L)	102 (71, 187)	102 (69, 180)	0.4
APSIII score	49 (38, 63)	50 (39, 65)	0.6
SOFA score	5.0 (4.0, 8.0)	5.0 (4.0, 8.0)	0.8
Cerebrovascular diseases	203 (8.6%)	92 (9.1%)	0.6
AIDS	25 (1.1%)	10 (1.0%)	0.9
COPD	657 (28%)	281 (28%)	>0.9
Outcome			0.3
Death	1776 (75%)	779 (77%)
Alive	579 (25%)	230 (23%)

SBP: systolic blood pressure; SpO₂: pulse oximetric oxygen saturation; ALP: alkaline phosphatase; APSIII: Acute Physiology Score III; SOFA: Sequential Organ Failure Assessment; AIDS: acquired immunodeficiency syndrome; COPD: chronic obstructive pulmonary disease.

Table 3.

Performance of machine learning models in the training set with five-fold cross-validation.

Model	AUC (95% CI)	Brier score (95% CI)	Accuracy (95% CI)	Precision (95% CI)	Recall (95% CI)
DT	0.65 (0.60–0.69)	0.16 (0.16–0.18)	0.76 (0.74–0.78)	0.78 (0.77–0.81)	0.94 (0.92–0.97)
KNN	0.71 (0.70–0.73)	0.16 (0.16–0.17)	0.77 (0.75–0.79)	0.79 (0.78–0.81)	0.94 (0.94–0.96)
LDA	0.77 (0.76–0.78)	0.16 (0.14–0.17)	0.78 (0.78–0.80)	0.8 (0.79–0.82)	0.94 (0.94–0.95)
NB	0.76 (0.74–0.78)	0.17 (0.17–0.19)	0.77 (0.76–0.79)	0.81 (0.80–0.83)	0.9 (0.89–0.92)
QDA	0.75 (0.73–0.76)	0.17 (0.16–0.18)	0.77 (0.76–0.79)	0.80 (0.79–0.82)	0.92 (0.91–0.94)
RF	0.78 (0.76–0.80)	0.15 (0.15–0.16)	0.80 (0.76–0.81)	0.77 (0.76–0.79)	0.98 (0.98–1)
SVM	0.76 (0.75–0.78)	0.15 (0.14–0.16)	0.78 (0.77–0.80)	0.79 (0.78–0.81)	0.96 (0.95–0.98)
XGboost	0.74 (0.72–0.75)	0.15 (0.15–0.17)	0.77 (0.76–0.79)	0.80 (0.79–0.83)	0.92 (0.92–0.94)

DT: decision tree; KNN: k-nearest neighnors; LDA: linear discriminant analysis; NB: naive Bayes; QDA: quadratic discriminant analysis; RF: random forest; SVM: support vector machine.

Bold font is used to emphasize the performance of the RF model.

Figure 3.

Performance of the random forest model in the validation set.

Interpretability of ML models

To enhance the interpretability of the RF model, multiple explainability techniques were applied. The SHAP summary plot (Figure 4) illustrates the contribution of individual features to model predictions. The horizontal axis represents the SHAP value, which indicates the impact of a feature on the model’s prediction for a particular outcome. Positive values suggest that the feature increases the probability of the predicted outcome, while negative values indicate that it decreases the probability. The vertical axis lists the features considered by the model in order of their importance. APSIII appears to be the most influential feature, with a wide distribution of SHAP values, indicating that it had a significant effect on the model’s predictions. SpO₂ also showed a considerable impact, exhibiting a trend where higher SpO₂ values (red dots) tended to have a positive SHAP value, suggesting that increased SpO₂ levels were associated with a higher probability of the predicted outcome.

Figure 4.

SHAP summary plot for RF model. SHAP: SHapley Additive exPlanations; RF: random forest.

For instance-level explanations, break down plots (Figure 5) were generated, which decompose specific predictions into their feature contributions. Each plot illustrates the step-by-step changes in the predicted outcome as feature values are applied. In both plots, the analysis began with an intercept value of 0.24, representing the baseline prediction when no feature information was considered. Features were then added sequentially, with each feature contributing either positively or negatively to the final prediction. In this figure, the left plot demonstrates a scenario wherein multiple features collectively increase the model’s prediction. Features such as heart rate, respiratory rate, SpO₂, and APSIII had notable positive contributions, resulting in a final predicted value of 0.672. In contrast, the right plot shows a case wherein several features reduce the prediction. Notably, the SOFA score had a substantial negative impact, leading to a final predicted value of 0.166. These visualizations help clarify how different combinations of clinical variables influence model output, offering insight into the decision-making process of the predictive algorithm.

Figure 5.

Break down plots of instance explanation.

Additionally, PDPs (Figure 6) were used to visualize the marginal effect of each feature on the predicted probability of in-hospital mortality. PDPs help assess nonlinear relationships between key predictors and model outcomes, further improving model transparency. Regarding APSIII, the plot in the figure showed that higher APSIII scores were associated with an increased probability of mortality, indicating that the severity of illness strongly influences survival predictions. Regarding SpO₂, lower SpO₂ levels were correlated with a higher risk of mortality, highlighting the importance of maintaining adequate oxygen levels in patients. Regarding heart rate, an elevated heart rate was linked to a higher probability of mortality, suggesting cardiovascular stability as a critical factor in patient outcomes. Regarding respiratory rate, increased respiratory rates were associated with poorer outcomes, indicating the significance of respiratory function in predicting patient survival. Regarding SOFA, higher SOFA scores indicated greater organ dysfunction and were strongly correlated with an increased likelihood of mortality.

Figure 6.

PDPs for variables in the RF model. PDPs: Partial Dependence Plots.

Discussion

In this study, ML models were developed to predict ICU mortality in cancer patients with sepsis. Among the eight models tested, the RF model demonstrated the best overall performance, achieving AUC values of 0.78 in the training set and 0.75 in the validation set. Additionally, the RF model’s high sensitivity (recall = 0.98) and strong calibration (Brier score = 0.15) indicate its utility in clinical risk stratification, particularly in minimizing missed diagnoses. These findings highlight the feasibility of integrating ML approaches into critical care settings to enhance early mortality prediction and improve personalized patient management.

Nomograms are predicated on the assumption of a linear relationship between variables. Thus, their applicability is limited in complex diseases with multiple factor interactions, such as tumor-related sepsis. However, the RF model has the advantage of being able to conduct risk assessments when dealing with large amounts of data or complex situations. It is more flexible, automatic, and accurate than nomograms. Furthermore, compared with conventional scoring systems, ML models encompass a broader range of clinical data, including laboratory parameters and comorbidities, thereby enhancing the precision of prediction.²⁵

To ensure the interpretability and clinical applicability of the models, we applied interpretability methods such as SHAP, PDP, and LIME. These methods provided clear insights into the role of each feature in the model’s predictions and its impact on patient outcomes. For example, SHAP values helped identify the most influential features, while PDPs revealed nonlinear relationships between the features and predicted outcomes. These results not only provide more information about patient prognosis but also enable more informed decisions, which help increase the clinical applicability of the model.

Combining the LASSO and RFE selection methods yielded 11 variables. Among them, APSIII, SpO₂, heart rate, respiratory rate, and SOFA score demonstrated greater contribution than the other variables. APSIII is a scoring system based on 12 physiological parameters, such as heart rate and respiratory rate, which can help assess the severity of illness in critically ill patients and predict the mortality risk.^1,26,27 Our interpretability analysis confirmed that higher APACHE III scores were strongly associated with an increased probability of mortality, aligning with established clinical evidence linking greater physiological derangement at admission to worse outcomes in sepsis. The SOFA score, a dynamic assessment of the severity of illness after ICU admission, indicates the degree of organ failure, which is consistent with the pathophysiological mechanisms of sepsis.^28,29 Our SHAP value analysis further illustrated how increasing SOFA subscores, particularly in cardiovascular and renal domains, correspond to higher mortality risk, reinforcing its clinical relevance in this patient population. Previous studies have also shown that mortality in sepsis patients is associated with higher ASPIII and SOFA scores.^30,31

Furthermore, higher heart and respiratory rates and low SpO₂ levels are associated with an increased risk of mortality.³² In the context of sepsis, these physiological changes often represent compensatory mechanisms aimed at maintaining cardiac output, blood pressure, and oxygen delivery in response to systemic inflammation and tissue hypoperfusion. These findings align closely with the characteristic clinical manifestations of circulatory and respiratory failure in sepsis. Our model’s ability to identify these parameters as important predictors supports its well-established role in early risk assessment. Timely recognition and management of abnormal vital signs—such as through targeted heart rate control, vasopressor therapy, or initiation of mechanical ventilation—are crucial for improving outcomes, especially in cancer patients who may have compromised physiological reserves.^33,34

Another notable finding of the present study is that ALT had a large weighting factor in our model. ALT is commonly used in clinical practice to evaluate liver function. In case of liver dysfunction, the circulating ALP levels are elevated.³⁵ Sepsis-associated liver injury has been reported in up to 24.48% of patients with sepsis and leads to poor outcomes.^36,37 ALP levels are frequently elevated in sepsis-induced liver dysfunction, and this elevation is hypothesized to be a consequence of both direct liver injury and systemic inflammation.^38,39 The liver is not only a source of inflammatory mediators but also a target organ for the effects of inflammatory mediators.⁴⁰ Monitoring these biochemical markers is essential for the early detection and management of patients with sepsis. In addition, patients with tumors may have a higher risk of liver damage due to the complexity of their underlying disease and treatment. Meanwhile, both chemotherapeutic drugs and immunotherapy drugs may cause liver toxicity and further increase the burden on the liver. It is necessary for these patients to avoid using drugs that may aggravate liver damage and adjust chemotherapy or immunotherapy regimens in a timely manner.

This study has certain limitations that warrant consideration. First, although the cohort was divided into training and validation sets in a ratio of 7:3, external validation was not performed. Multicenter studies are required to further assess the generalizability of the model. Given that this study was based on retrospective data, potential selection bias and the influence of unmeasured confounding factors cannot be entirely ruled out. Additionally, the prediction model exhibited a relatively high false negative rate, which may be attributed to class imbalance and the use of a default classification threshold of 50%. Acquiring more positive samples or applying threshold optimization techniques could help improve the model’s predictive performance.

Footnotes

Acknowledgements

Not applicable.

Author contributions

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Data availability statement

The study data will be shared upon request to the corresponding author.

Declaration of conflicting interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.

Ethical considerations

The Medical Information Mart for Intensive Care IV (MIMIC-IV) database was used, and the need for ethical approval was waived. The approval number is 48533840.

Funding

The research was funded by Jiangsu Province (Suqian) Hospital Research Project (ID: SY202407) and Jiangsu Province (Suqian) Hospital 136 Talent training plan (ID: 2307205101).

ORCID iD

Xiuji Kan

References

Singer

Deutschman

Seymour

, et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA 2016; 315: 801–810.

Abdul-Aziz

Hammond

Brett

, et al. Prolonged vs intermittent infusions of β-lactam antibiotics in adults with sepsis or septic shock: a systematic review and meta-analysis. JAMA 2024; 332: 638–648.

Rudd

Johnson

Agesa

, et al. Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the Global Burden of Disease Study. Lancet 2020; 395: 200–211.

Awad

Nazer

Elfarr

, et al. A 12-year study evaluating the outcomes and predictors of mortality in critically ill cancer patients admitted with septic shock. BMC Cancer 2021; 21: 709.

Van der Poll

Shankar-Hari

Wiersinga

WJ.

The immunology of sepsis. Immunity 2021; 54: 2450–2464.

Mirouse

Vigneron

Llitjos

, et al. Sepsis and cancer: an interplay of friends and foes. Am J Respir Crit Care Med 2020; 202: 1625–1635.

Hensley

Donnelly

Carlton

, et al. Epidemiology and outcomes of cancer-related versus non-cancer-related sepsis hospitalizations. Crit Care Med 2019; 47: 1310–1316.

Nates

Pène

Darmon

; Nine-I Investigatorset al. Septic shock in the immunocompromised cancer patient: a narrative review. Crit Care 2024; 28: 285.

Cuenca

Manjappachar

Ramírez

, et al. Outcomes and predictors of 28-day mortality in patients with solid tumors and septic shock defined by Third International Consensus Definitions for Sepsis and Septic Shock Criteria. Chest 2022; 162: 1063–1073.

10.

Jiang

Zhao

, et al. A novel risk classifier to predict the in-hospital death risk of nosocomial infections in elderly cancer patients. Front Cell Infect Microbiol 2023; 13: 1179958.

11.

Yang

Dong

, et al. Development and validation of a nomogram for predicting the prognosis in cancer patients with sepsis. Cancer Med 2022; 11: 2345–2355.

12.

Kijpaisalratana

Sanglertsinlapachai

Techaratsami

, et al. Machine learning algorithms for early sepsis detection in the emergency department: a retrospective study. Int J Med Inform 2022; 160: 104689.

13.

Peng

Yang

, et al. Machine learning approach for the prediction of 30-day mortality in patients with sepsis-associated encephalopathy. BMC Med Res Methodol 2022; 22: 183.

14.

Yong

Zhenzhou

Deep learning-based prediction of in-hospital mortality for sepsis. Sci Rep 2024; 14: 372.

15.

Fleuren

Klausch

TLT

Zwager

, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med 2020; 46: 383–400.

16.

Zhou

Liu

, et al. Interpretable machine learning model for early prediction of 28-day mortality in ICU patients with sepsis-induced coagulopathy: development and validation. Eur J Med Res 2024; 29: 14.

17.

Yang

Zhang

, et al. Predictive value of soluble CD40L combined with APACHE II score in elderly patients with sepsis in the emergency department. BMC Anesthesiol 2024; 24: 32.

18.

De Hond

Raven

Schinkelshoek

, et al. Machine learning for developing a prediction model of hospital admission of emergency department patients: hype or hope? Int J Med Inform 2021; 152: 104496.

19.

Chen

Zong

Zou

, et al. A novel clinical prediction model for in-hospital mortality in sepsis patients complicated by ARDS: a MIMIC IV database and external validation study. Heliyon 2024; 10: e33337.

20.

Zhang

Huang

, et al. Prediction of prognosis in elderly patients with sepsis based on machine learning (random survival forest). BMC Emerg Med 2022; 22: 26.

21.

Chen

Lin

, et al. Transferability and interpretability of the sepsis prediction models in the intensive care unit. BMC Med Inform Decis Mak 2022; 22: 343.

22.

García-Gallo

Fonseca-Ruiz

Celi

, et al. A machine learning-based model for 1-year mortality prediction in patients admitted to an intensive care unit with a diagnosis of sepsis. Med Intensiva (Engl Ed) 2020; 44: 160–170.

23.

Bennett

Ulrich

Van Damme

, et al. MIMIC-IV on FHIR: converting a decade of in-patient data into an exchangeable, interoperable format. J Am Med Inform Assoc 2023; 30: 718–725.

24.

Von Elm

Altman

Egger

; STROBE Initiativeet al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol 2008; 61: 344–349.

25.

Yadgarov

Landoni

Berikashvili

, et al. Early detection of sepsis using machine learning algorithms: a systematic review and network meta-analysis. Front Med (Lausanne) 2024; 11: 1491358.

26.

Fan

The value of five scoring systems in predicting the prognosis of patients with sepsis-associated acute respiratory failure. Sci Rep 2024; 14: 4760.

27.

Keegan

Gajic

Afessa

Comparison of APACHE III, APACHE IV, SAPS 3, and MPM0III and influence of resuscitation status on model performance. Chest 2012; 142: 851–858.

28.

Vincent

De Mendonça

Cantraine

, et al. Use of the SOFA score to assess the incidence of organ dysfunction/failure in intensive care units: results of a multicenter, prospective study. Working group on “sepsis-related problems” of the European Society of Intensive Care Medicine. Crit Care Med 1998; 26: 1793–1800.

29.

Vincent

Moreno

Clinical review: scoring systems in the critically ill. Crit Care 2010; 14: 207.

30.

Sprung

Peduzzi

Shatney

, et al. Impact of encephalopathy on mortality in the sepsis syndrome. The Veterans Administration Systemic Sepsis Cooperative Study Group. Crit Care Med 1990; 18: 801–806.

31.

Zhang

Wang

, et al. Epidemiological features and risk factors of sepsis-associated encephalopathy in intensive care unit patients: 2008–2011. Chin Med J (Engl) 2012; 125: 828–831.

32.

Ning

, et al. Association between heart rate and mortality in patients with septic shock: an analysis revealed by time series data. BMC Infect Dis 2024; 24: 1088.

33.

Koch

Edinger

Fischer

, et al. Comparison of qSOFA score, SOFA score, and SIRS criteria for the prediction of infection and mortality among surgical intermediate and intensive care patients. World J Emerg Surg 2020; 15: 63.

34.

Morelli

Sanfilippo

Romano

SM.

Esmolol in septic shock: old pathophysiological concepts, an old drug, perhaps a new hemodynamic strategy in the right patient. J Thorac Dis 2016; 8: 3059–3062.

35.

Wangler

Jansky

[Evaluation of abnormal liver chemistries in primary care - a survey on the prerequisites, procedure and challenges faced by general practitioners]. Z Gastroenterol 2022; 60: 1203–1211.

36.

Wang

Zhang

, et al. Targeting ferroptosis offers therapy choice in sepsis-associated acute lung injury. Eur J Med Chem 2025; 283: 117152.

37.

Lin

Liang

Cai

, et al. [Analysis of high-risk factors and clinical characteristics of sepsis-related liver injury]. Zhonghua Wei Zhong Bing Ji Jiu Yi Xue 2021; 33: 186–191.

38.

Xue

Wang

, et al. Association between alkaline phosphatase/albumin ratio and the prognosis in patients with chronic kidney disease stages 1–4: results from a C-STRIDE prospective cohort study. Front Med (Lausanne) 2023; 10: 1215318.

39.

Wang

, et al. [Research progress in the mechanism of intestinal environmental disturbance on the occurrence and development of sepsis-associated liver injury]. Zhonghua Wei Zhong Bing Ji Jiu Yi Xue 2024; 36: 660–663.

40.

Szabo

Romics

Frendl

Liver in sepsis and systemic inflammatory response syndrome. Clin Liver Dis 2002; 6: 1045–1066.