Abstract
Background
This study aimed to develop and validate a machine learning model to predict in-hospital mortality in ICU patients with diabetes and cardiovascular diseases and to construct a web-based risk calculator to assist clinical decision-making.
Methods
This study retrospectively collected data on diabetic patients with cardiovascular diseases from the MIMIC-IV and eICU-CRD databases, including 4,074 patients from MIMIC-IV for model training and internal validation and 1,261 patients from eICU-CRD for external validation. Thirteen feature variables were selected using the Boruta algorithm, and eight machine learning algorithms were applied to construct prediction models. Model performance was evaluated using ROC-AUC, precision-recall, calibration curves, and SHAP algorithm for model interpretability.
Results
The Logistic Regression model demonstrated the best predictive performance, with an ROC-AUC of 0.896 (95%CI 0.834-0.943) and 0.820 (95%CI 0.768-0.869) for internal and external validation, respectively. It also achieved high sensitivity (0.851 and 0.931) in the internal and external validation cohorts. SHAP analysis indicated that vasopressor usage, APSIII score, GLU level, respiratory rate, and oxygen saturation were the most critical features influencing in-hospital mortality risk.
Conclusion
The developed Logistic Regression model exhibited high predictive accuracy and robustness in ICU patients with diabetes and cardiovascular diseases. A Web-based risk calculator was successfully constructed to provide personalized mortality risk assessment and decision support for clinicians.
Keywords
Introduction
Cardiovascular disease (CVD) and diabetes mellitus (DM) are two prevalent chronic conditions in human society. Globally, CVD claims the lives of over 18 million individuals annually, while DM affects more than 400 million patients worldwide.1,2 CVD and DM frequently coexist, particularly among middle-aged and older adults.3,4 This coexistence may be attributed to shared pathogenic factors underlying both diseases, such as obesity or overweight conditions, dyslipidemia, smoking habits, and family history. In diabetes, pathological mechanisms like insulin resistance, oxidative stress can cause macrovascular or microvascular damage. This substantially increases mortality risk among patients with both CVD and DM.5,6 In a study conducted in Korea, it was revealed that patients with DM and pre-DM had a 1.05-fold and 1.51-fold higher risk of developing heart failure, respectively, compared to those with normal blood glucose levels. Additionally, they had a 1.05-fold and 1.59-fold increased risk of experiencing a myocardial infarction, respectively. 7 In another cohort study carried out in Malaysia, researchers discovered that patients with DM had twice the cardiovascular mortality rate of non-diabetic patients and were more prone to premature death. 8 Recently, researchers have focused their attention on patients with CVD and comorbid DM in the intensive care unit (ICU). A substantial number of studies have indicated that diabetes-related markers such as blood glucose levels, insulin resistance are independently associated with an increased risk of in-hospital mortality in ICU patients with severe CVD.9–11 This highlights the high risk of in-hospital death for ICU patients with CVD and comorbid DM. However, there is still a dearth of relevant risk-scoring systems to assist clinicians in making early diagnoses.
With the passage of time and the progress of technological advancements, contemporary clinical studies have witnessed remarkable enhancements in terms of data volume and dimensionality when compared to their predecessors. This improvement not only bolsters the credibility of these studies but also presents a formidable challenge to researchers in the realm of data analysis. 12 Previous clinical research methods are difficult to use to analyze high-dimensional and complex data. Machine learning (ML) has emerged as an important branch of artificial intelligence in recent years, which is capable of identifying complex patterns in large-scale datasets and can handle high-dimensional data independently of the nonlinear relationships between the data.13,14 Predictive models trained using ML algorithms have been extensively applied in numerous clinical disease diagnoses and treatment decision-making processes. These models have demonstrated superior accuracy and specificity. Shapley Additive Explanations (SHAP) is a cutting-edge visualization technique used to quantify the contribution of each variable to the model’s decision-making process. SHAP helps mitigate the “black-box” limitation that has long plagued traditional models.15,16 In earlier investigations, researchers employed ML algorithms to develop predictive models for assessing the prognosis of cardiovascular patients with comorbid DM within the general population.17,18 However, it is crucial to note that patients admitted to the ICU exhibit substantial differences from those in the general population. Despite the availability of numerous ICU scoring systems, there remains a lack of specific mortality risk scores tailored to patients with severe cardiovascular disease and comorbid DM.19,20 Therefore, there is an urgent need to develop a visual machine learning prediction model that can be precisely applied to assess the mortality risk of cardiovascular patients with concomitant diabetes mellitus in the ICU. Its primary objective is to serve as an individualized risk stratification tool, consolidating multiple clinical information points within a short timeframe, and ensuring consistent decision-making when risk assessments conflict with subjective judgments.
This study employed eight distinct ML methods to construct a model predicting in-hospital mortality risk among cardiovascular disease patients with concomitant DM in the ICU. Internal and external validation was conducted using the Medical Information for Critical Care Medicine Database IV (MIMIC-IV) and the eICU Collaborative Research Database (eICU-CRD) datasets. Subsequently, the SHAP method was employed to elucidate the specific contributions of each variable within the optimal machine learning model, leading to the successful development of a web-based clinical decision calculator system. Figure 1 provides a comprehensive overview of all processes involved in the study design. Schematic of the process of constructing machine learning predictive models in this study and summary of the research design. Abbreviation: MIMIC-IV, Medical Information for Critical Care Medicine Database IV; eICU-CRD, eICU Collaborative Research Database; AUC, Area Under the Curve; SHAP, SHapley Additive explanations.
Materials and methods
Study population
The raw data for this study were primarily obtained from two independent ICU databases, the MIMIC-IV database and the eICU-CRD database.21,22 The MIMIC-IV database contains longitudinal, de-identified data of ICU patients admitted to Beth Israel Deaconess Medical Center between 2008 and 2019, and is maintained by the Massachusetts Institute of Technology Laboratory for Computational Physiology. The database includes clinical information on more than 65,000 unique patients. The investigator completed the required credentialing and obtained authorized access to MIMIC-IV (record ID: 59123180). The eICU-CRD is a multicenter critical care database developed by Philips Healthcare, comprising de-identified health data for over 200,000 ICU admissions from 335 ICUs across more than 200 hospitals in the United States between 2014 and 2015. Both databases provide de-identified data, and the researchers were unable to identify individual patients.
To ensure data consistency when the ML model was internally validated and externally validated, we included the MIMIC-IV and eICU-CRD datasets by selecting the same feature variables as much as possible and following the same nadir criteria as much as possible. The inclusion criteria were as follows: 1) patients who were first admitted to the ICU and stayed for 24 hours; 2) patients with the diagnosis of cardiovascular disease; 3) patients aged ≥18 years; 4) patients with comorbid DM. Exclusion criteria: 1) patients without a diagnosis of cardiovascular disease or diabetes; 2) patients with 20% or more of the characteristic variables missing. and 3) patients who lacked a record of follow-up to discharge or in-hospital death. The only difference is that for data inclusion as an independent, externally validated dataset, eICU-CRD, we removed all variables with missing values, ensuring the reliability of the external validation. In the end, the MIMIC-IV database included 4,074 individuals in this study, while the eICU-CRD database included 1,261 individuals. The specific inclusion exclusion process is detailed in Supplementary Figure 1.
Data collection and definition
For baseline data collection, we used PostgreSQL (version 16.0) and Structured Query Language (SQL) data collection methods. Characteristic variables included in the baseline data were demographic characteristics: gender, age, Ethnicity, weight, and height; vital signs: temperature, respiratory rate, heart rate, oxygen saturation, systolic blood pressure (SBP), and diastolic blood pressure (DBP); and laboratory parameters: red blood cell count (RBC), white blood cell count (WBC), platelet count (PLTC), creatinine (Cre), blood urea nitrogen (BUN), total cholesterol (TC), triglycerides (TG), low-density lipoprotein cholesterol (LDL-C), hemoglobin A1c (HbA1c), high-density lipoprotein cholesterol (HDL-C), Na, K, glucose (GLU), albumin (ALB), and hemoglobin (Hb). Critical care score: Glasgow Coma Scale (GCS), Acute Physiology and Chronic Health Evaluation III (APACHE III); comorbidities: acute myocardial infarction, chronic kidney disease, acute kidney injury, atrial fibrillation, hypertension, cardiogenic shock, ischemic stroke, DM, congestive heart failure; medications: aspirin, insulin, statins, metoprolol, diuretics, vasoactive drugs, angiotensin receptor blockers (ARBs), and angiotensin-converting enzyme inhibitors (ACEIs); in-hospital death (yes, no); and information on clinical operations: continuous renal replacement therapy, mechanical ventilation, and percutaneous coronary intervention (PCI).
The cardiovascular disease diagnosis criteria employed in this study utilized ICD-9 and ICD-10 diagnostic codes, defined as a composite comorbidity encompassing coronary artery disease, ischemic stroke, peripheral artery disease, and heart failure. This composite definition captures the overall burden of cardiovascular comorbidities among DM patients in the ICU, which holds clinical significance for short-term prognosis and case mix adjustment. We regard it as a baseline comorbidity indicator rather than a single mechanistic disease entity. All indicators were obtained within 24 hours of admission (APSIII scores were calculated within 24 hours of ICU admission).
Model variable selection
Prior to the selection of variables for inclusion in the ML model, we employed the MissForest technique to impute the missing values within the MIMIC-IV dataset. Subsequently, the finalized MIMIC-IV data incorporated in the study were partitioned into a training subset (n = 3,259) and an internal validation subset (n = 815) in an 8:2 ratio. In the subsequent phase, we assessed pairwise Spearman correlations among candidate predictors and applied a redundancy filter (|ρ| > 0.80) by retaining a single representative variable from each highly correlated pair. This step was not intended to improve random forest fitting, but to reduce redundant predictors and improve the interpretability and stability of downstream Boruta feature selection, as variable importance can be redistributed across correlated features. Feature selection was then performed using the Boruta algorithm. Following the application of the Boruta algorithm, we conducted both univariate and multivariate logistic regression analyses. The objective of these analyses was to further elucidate the independent impacts and interrelationships of the selected characteristic variables on the outcome of in-hospital mortality. By synthesizing the results from these two variable selection approaches, we determined the definitive set of variables for constructing the ML model.
Machine learning model construction and validation
After completing the screening of the model feature variables, we coded the categorical variables therein with unique heat. This method is capable of converting dichotomous variables into binary matrices, effectively eliminating ordinality and avoiding the presence of unintentional hierarchies in the data to ensure effective utilization and consistency of the data. After the included variables were processed, the Synthetic Minority Over-sampling Technique (SMOTE) was used to balance the distribution of in-hospital deaths and non-death data in the training set. By generating synthetic minority-class samples through interpolation in the feature space, SMOTE balanced the class distribution of in-hospital deaths and non-deaths to a 1:1 ratio, which was clearly implemented during model development. Importantly, the validation and test sets retained their original class proportions to ensure unbiased assessment of real-world model performance.
Subsequently, eight ML algorithms were used to construct a prediction model for in-hospital deaths based on the training set, including the Decision Tree (DT) algorithm, the Gradient Boosting Decision Tree (GBDT) algorithm, the K-nearest Neighbor, (KNN) algorithm, Multilayer Perceptron (MLP), Light Gradient Boosting Machine(LightGBM), Extreme Gradient Boosting (XGBoost), Random Forest (RF) algorithms and logistic regression (LR). We have additionally incorporated LR as a baseline model. The DT algorithm is simple but provides interpretable insights into interactions. The GBDT algorithm combines multiple weak decision trees and handles nonlinear data well. The KNN algorithm is the easiest ML algorithm to deploy and easy to understand, the RF is particularly robust in analyzing structured data, and the MLP algorithm excels in handling complex patterns of data. The LightGBM algorithm is known for its high efficiency and fast training on large-scale data. The XGBoost algorithm incorporates regularization to prevent overfitting and handles missing values effectively. The ML models of all eight algorithms were parameter-tuned and cross-validated 5-fold in both the training set and the internal validation set, to ensure the robustness and reliability of the results. In evaluating the results, we introduced ROC, AUC, calibration curves, precision-recall (PR) curves, and confusion matrices to assess the prediction performance of different ML models and also plotted corresponding heatmaps and radar charts to visualize and compare the accuracy. Subsequently, we conducted external validation in the eICU-CRD dataset. Finally, combining the results of the training set, internal validation, and external validation, we selected the predictive model with the best predictive performance for the risk of in-hospital death in cardiovascular patients with comorbid DM in the ICU and developed a Web risk calculator based on it.
Model explanation
To gain a more intuitive understanding of the output process in machine learning models, we employed the SHAP algorithm to reveal the contribution levels of different feature variables within the selected machine learning model with the strongest predictive capability, ranking them by importance. The SHAP algorithm employs Shapley values as a quantitative interpretive framework, approximating contributions by precisely calculating each feature variable’s impact on the original model. These values quantify how much each feature variable contributes to the machine learning model’s predictive accuracy while visualizing variable contributions and enabling side-by-side comparisons. Throughout visualization, red or positive values denote risk factors for in-hospital mortality, while blue or negative values represent protective factors. Finally, we randomly selected patients and applied the chosen optimal machine learning prediction model to provide personalized risk predictions for their likelihood of in-hospital mortality during ICU admission.
Statistical analyses
In the baseline analysis of the MIMIC - IV and eICU-CRD datasets, we segregated the data into in-hospital mortality and non-mortality groups. For continuous variables within the covariates of the two groups, we first conducted normality tests. For those variables that followed a normal distribution, we employed independent samples t-tests; for non-normally distributed variables, we utilized Mann- Whitney U - tests. The results for these continuous variables were presented as the median and interquartile range (Q1, Q3). Regarding categorical variables, we employed chi-square tests to compare differences between the two groups, and the results were presented as percentages (%).
To enable comparison with conventional ICU scoring systems, APSIII was evaluated using ROC analysis in all cohorts, with AUC values and 95% confidence intervals reported. Furthermore, formal pairwise comparisons of AUCs were conducted among all machine learning models and between the best-performing model and APSIII using the DeLong test, providing rigorous statistical assessment of differences in discriminative performance. In the multivariate logistic regression models, we adjusted for all characteristic variables screened by the Boruta algorithm. The correlation results were reported using the odds ratio (OR) along with the corresponding 95% confidence intervals (CI). All statistical analyses and model construction in this study were carried out in Python 3.9.0 and R software 4.3.2 and their respective environments. Python was predominantly used for constructing machine learning models and developing the web-based risk calculator. The key libraries utilized in Python were Sklearn (version 1.2.2), Shap (version 0.42.1), and Shiny (version 1.24.1). All statistical tests were two-sided, and statistical significance was defined as a p-value less than 0.05.
Results
MIMIC-IV and eICU-CRD baseline information
Baseline characteristics of MIMIC-IV.
Baseline characteristics of eICU-CRD.
Subsequently, to eliminate potential covariates, we conducted a spearman correlation analysis on all feature variables within the MIMIC-IV dataset, which functioned as both the training set and the internal test set. We defined a significant covariance between feature variables as an absolute correlation coefficient greater than 0.8. The results of this correlation analysis are visually presented as a heat map in Supplementary Figure 2. From this heat map, we identified significant covariance between Hb and RBC, with a correlation coefficient of 0.85, as well as between LDL-C and TC, with a correlation coefficient of 0.87. This finding implies that these two pairs of variables cannot be incorporated simultaneously into the final model. Meanwhile, no significant covariance was detected among the remaining feature variables.
Variable screening and logistic regression
Prior to implementing the ML algorithm for model construction, we employed the Boruta algorithm to conduct feature variable screening and ranked the variables according to their importance. Ultimately, we selected 13 feature variables for subsequent ML model building. These variables specifically included age, respiratory rate, the presence of cardiogenic shock, use of diuretics, heart rate, use of aspirin, oxygen saturation, administration of vasoactive drugs, use of ACEI/ARB, use of metoprolol, GLU levels, body temperature, and APSIII scores. The detailed screening results of the Boruta algorithm are visually presented in Figure 2. Notably, there were no covariates among the screened feature variables. This suggests that all the screened variables can be incorporated into the model-building process. Graph of the results of Boruta’s method for selecting the variables to be screened for modeling features of the ML model, where green is the final variable to be included in the modeling. Abbreviation: TG, triglycerides; TC, total cholesterol; PCI, percutaneous coronary intervention; LDL, low-density lipoprotein cholesterol; ALB, albumin; RBC, red blood cell count; CRE, creatinine; PLT, platelet count; HDL, high-density lipoprotein cholesterol; ICH, intracerebral hemorrhage; HGB, hemoglobin; HTN, hypertension; AF, atrial fibrillation; CHF, congestive heart failure; AKI, acute kidney injury; CKD, chronic kidney disease; AMI, acute myocardial infarction; GCS, Glasgow Coma Scale; WBC, white blood cell count; BUN, blood urea nitrogen; DBP, diastolic blood pressure; SBP, systolic blood pressure; RR, respiratory rate; CS, cardiogenic shock; HR, heart rate; GLU, glucose; APSIII, Acute Physiology Score III (APACHE III score).
After screening 13 characteristic variables, we also verified the independent effects and significant correlations of the characteristic variables on in-hospital death outcomes using univariate and multivariate logistic regression analyses. The results showed that in univariate logistic regression analysis, age (OR=1.02, 95%CI=1.00-1.03, p=0.021), temperature (OR=0.95, 95%CI=0.94-0.96, p<0.001), heart rate (OR=1.01, 95%CI=1.00-1.02, p=0.028), respiratory rate (OR=1.07, 95%CI=1.05-1.09, p<0.001), oxygen saturation (OR=0.95, 95%CI=0.92-0.97, p<0.001), APSIII score (OR=1.05, 95%CI=1.05-1.06, p<0.001), GLU (OR=1.00, 95%CI=1.00-1.01, p<0.001), cardiogenic shock (OR=3.61, 95% CI=2.53-5.17, p<0.001), ACEI/ARB (OR=0.31, 95%CI=0.23-0.42, p<0.001), aspirin (OR=0.32, 95%CI=0.21-0.47, p<0.001), metoprolol (OR=0.35, 95%CI=0.24-0.50, p<0.001), diuretics (OR=0.63 95%CI=0.44-0.91, p=0.013), and vasoactive drugs (OR=4.35, 95%CI= 3.12-6.05, p<0.001) were all significantly associated with in-hospital death of patients, which demonstrates the rationale of Brouta’s algorithm to screen for characteristic variables. In multivariate logistic analysis, after adjusting for the feature variables, all feature variables remained significant except heart rate which was no longer significant, which again demonstrated the rationality of the screening of the modeling variables and the stability of the model (Figure 3). Forest plots of univariate and multivariate logistic regression analyses for exploring the associations of the screened characteristic variables with in-hospital deaths in patients with cardiovascular disease with comorbid DM in the ICU. In the figure, variables with an OR or 95% CI of less than 1 indicate a protective factor for in-hospital death in patients, whereas an OR or 95% CI of greater than 1 is a risk factor for in-hospital death in patients. Abbreviation: HR, heart rate; RR, respiratory rate; SpO2, peripheral capillary oxygen saturation; APSIII, Acute Physiology Score III (APACHE III score); GLU, glucose; CS, cardiogenic shock.
Model performance comparison and external validation
After confirming the 13 feature variables for modeling, we randomly partitioned the training set and the validation set within the MIMIC-IV dataset at a ratio of 8:2 and conducted data balancing using the SMOTE. The performance of the model after smote balancing is significantly improved compared to before balancing. For details of the results before balancing, please refer to the Supplementary Figure 8 and Supplementary Figure 9. Subsequently, the LR, DT, XGBoost, LightGBM, GBDT, KNN, MLP, and RF machine learning algorithms were employed to construct ML predictive models within the training set. All these models underwent five-fold cross-validation, and the results are illustrated in Figure 4. Furthermore, we carried out independent external validation based on the eICU-CRD dataset. In the results, we observed that, both in the internal validation and the external validation, the LR machine learning model demonstrated the highest values of the ROC-AUC, which were 0.896 and 0.820 respectively. Similarly, it also demonstrated extremely high sensitivity (0.851 and 0.931) and recall (0.851 and 0.948). The consistency of the results from both the internal and external validations attest to the outstanding predictive performance and broad general applicability of the LR predictive model (Supplementary Table 3). Additionally, the PR curves and the calibration curves further reinforce the reliability of our findings (Supplementary Figure 3 and Supplementary Figure 4). The performance of the machine learning model on the training set, internal validation set, and external validation set, respectively. Abbreviation: ROC curves. DT: Decision Tree Algorithm; GBDT: Gradient Boosting Decision Tree Algorithm; KNN: K-Nearest Neighbor Algorithm; MLP: Multi-Layer Perceptron Machine Algorithm; RF: Random Forest Algorithm; LR: logistic regression; XGBoost, Extreme Gradient Boosting.
Combining the results from the internal and external validation sets, we found that the LR machine learning model outperforms the other six machine learning prediction models. To present the results more comprehensively, we performed a formal statistical comparison of the ROC curves using the DeLong test, incorporating 95% confidence intervals. Specifically, we selected the best-performing model—LR—as the reference model and performed pairwise DeLong tests with all other classifiers. The results showed that LR significantly outperformed the other models (Supplementary Table 4). Since the database comes from a comprehensive ICU, to avoid the influence of different patient types on the outcome, we performed a stratified sensitivity analysis by ICU type (cardiology, internal medicine, and surgery). The results showed that all subgroups had consistently robust predictive performance, as shown in Supplementary Figure 5.
Predictive model interpretability and interaction effects
Building upon the LR machine learning model, which demonstrated the highest predictive efficacy, we employed the SHAP method to conduct a model interpretability analysis. This analysis aimed to uncover the magnitude and orientation of the contribution that each of the 13 feature variables integrated into the model made to the prediction outcomes. In terms of the order of their contributions, the variables were as follows: APSIII, ACEI/ARB, vasoactive drugs, GLU levels, age, respiratory rate, the presence of cardiogenic shock, aspirin, betablocker, body temperature, diuretics, oxygen saturation and heart rate (Figure 5(a) and (b)). In the figures, the red color signifies that a variable is a risk factor for in-hospital death, whereas the blue color represents a protective factor. Moreover, we constructed a stacked force diagram for the LR machine-learning model. In the force diagrams, red-colored variables signify factors that elevate the predicted risk of in-hospital death, whereas blue-colored variables represent those that mitigate this risk. The length of the arrows corresponds to the magnitude of the variable’s influence on the prediction, with longer arrows indicating a more substantial impact. This diagram offers a clustered perspective based on the SHAP values of all patients. With a single click, researchers can view the SHAP values of each patient’s model variables, enabling them to comprehensively assess both the macroscopic and microscopic roles of the characteristic variables within the model (Figure 5(c)). Illustration of SHAP interpretability of feature variables based on LR machine learning models. (a) Ranking and direction of the contribution of feature variables to the prediction of in-hospital deaths for feature variables using the LR machine learning model. Where red color indicates that the variable exacerbates the occurrence of in-hospital deaths, and conversely, the blue color indicates that the variable avoids exacerbating in-hospital deaths. (b) This diagram illustrates the SHAP contribution distribution of key features across different individuals (horizontal axis, instances) (vertical axis represents features). Colors indicate the direction and intensity of the feature’s influence on the model output. (c) Stacked force map providing a clustered view of the SHAP values of the variables across the validation set for the LR model. Abbreviation: HR, heart rate; RR, respiratory rate; SpO2, peripheral capillary oxygen saturation; APSIII, Acute Physiology Score III (APACHE III score); GLU, glucose; CS, cardiogenic shock.
To explore potential interaction effects among model variables, we constructed the SHAP interaction network, which summarizes how predictors collectively influence the model’s risk assessment (Supplementary Figure 6). In this network, APSIII made the largest overall contribution, but its role was more a generalization of the relationship between patient severity and context than a mechanistic driver. Notably, the model showed a stronger interaction pattern between APSIII and several common clinical signals, including blood glucose, vasoactive drug use, ACEI/ARB, and key vital signs. This may indicate that different clinical features have different weights at different baseline disease severity levels. Furthermore, since APSIII, included in this model, is itself a mature score, we further compared its predictive power directly with that of the APSIII score itself to quantify the incremental value of the new model. As shown in Supplementary Figure 7, the complete model exhibits better overall performance than using APSIII alone, with higher discrimination and overall predictive accuracy, further emphasizing the superiority of the new model.
Online web risk calculator
A web-based clinical decision support tool was developed and deployed using the Python Shiny framework. The calculator enables clinicians to input patient-specific clinical variables and obtain real-time individualized predictions of the target outcome based on the final optimized machine learning model (Figure 6). The application was implemented in Python (version 3.12) using the Shiny framework and deployed on a secure cloud-based server, ensuring stable performance and broad accessibility. A user-friendly graphical interface was designed to facilitate intuitive operation in routine clinical practice. The web-based calculator is publicly accessible at [https://yanzewu.shinyapps.io/shinyweb/], and step-by-step usage instructions are provided in the Supplementary Materials. Online web calculator used to calculate the risk of in-hospital mortality in cardiovascular patients with comorbid diabetes in the ICU (https://yanzewu.shinyapps.io/shinyweb/).
Discussion
In this study, we constructed ML predictive models to forecast mortality risk in cardiovascular disease patients with concomitant diabetes admitted to the ICU. In both internal and external validation, the LR model demonstrated optimal predictive performance, achieving ROC-AUC values of 0.896 and 0.820, respectively. Subsequently, SHAP interpretability analysis revealed that the most influential feature variables included in the model’s decision-making process were, in descending order: vasoactive drug use, APSIII score, blood glucose level, respiratory rate, and oxygen saturation. Finally, based on the optimal model, we developed a real-time web calculator within a telemedicine system specifically designed to assess this type of in-hospital mortality risk. This system aims to help clinicians promptly identify the in-hospital mortality risk in such patients, thereby facilitating timely and informed medical decisions.
In a study focused on predicting the 10-year CVD risk among diabetic patients, the researchers employed ML techniques to develop a prediction model. The optimal ML prediction model obtained a ROC value of 0.761(6). In another investigation aimed at predicting the CVD risk among diabetic inpatients in South Korea, the researchers selected the LR model as the most effective model for risk prediction. In the internal validation phase, this model achieved an ROC value of 0.84, while in the external validation, it reached 0.72. 23 The model incorporated Cre, GLU, LDL-C, and the use of diuretics, etc., which shares some similarities with our model. Nevertheless, our study focused on predicting the overall risk of in-hospital mortality for patients with cardiovascular disease and comorbid diabetes. The results of the SHAP interpretability analysis visually elucidated the contribution mechanisms in our LR model. From these results, we identified multiple variables, including vasoactive drugs, APSIII score, blood glucose level, respiratory rate, oxygen saturation, body temperature, age, heart rate, diuretics, and ACEI/ARB drugs, that significantly contributed to the prediction of in-hospital mortality risk. Among these variables, higher APSIII scores, absence of ACEI/ARB use, use of vasoactive drugs, elevated blood glucose levels, and high heart rate contributed the most to the risk of in-hospital mortality.
In the predictive model of this study, the APSIII score is the variable that contributes the most and is the cornerstone of the new predictive model. The APSIII score is one of the commonly used scoring systems in the ICU to assess the condition of patients after mechanical ventilation and predict their prognosis. The higher the APSIII score, the worse the patient’s prognosis, which ensures the consistency of our model interpretability analysis. 24 Furthermore, although the APSIII score may not be as effective as modern machine learning models in predicting the risk of death in critically ill patients, its predictive ability on its own remains at a high level, which can help clinicians better assess the risk of patient death.25,26 We also found that the use of ACE inhibitors or angiotensin receptor blockers ranked second in contribution to reducing the risk of in-hospital mortality in cardiovascular patients with DM within the ICU. ACE inhibitors/ARB analogs predominantly act on the renin-angiotensin-aldosterone system. Clinically, they are widely employed for their antihypertensive properties and their ability to counteract ventricular remodeling.27,28 Research has demonstrated that these medications can reduce the in-hospital mortality risk associated with various cardiovascular diseases during an ICU stay, and they play a significant role in relevant machine learning prediction models, which is largely in line with our findings.29,30 This suggests that early use of ACE inhibitors or angiotensin receptor blockers may improve prognosis when clinicians encounter such high-risk patients in the ICU.
The SHAP interpretability analysis revealed that a use of vasoactive medications was correlated with an elevated predicted risk of in-hospital death. In a recent study carried out by the American Society for the Use of Cardiac Angiography and Interventions, the researchers discovered that the use of two or more vasoactive drugs was linked to a poor prognosis, which aligns with our findings.31,32 This concept was further validated in another investigation conducted by Mexican academics. Their study demonstrated that in patients suffering from post-infarction cardiogenic shock, the use of more than two vasoactive medications significantly heightened the risk of death. 33 Moreover, the improper utilization of vasoactive medications can give rise to several adverse outcomes. It may increase the risk of arrhythmias, elevate myocardial oxygen demand, and, in severe instances, result in systemic microcirculatory ischemia.34,35 This finding underscores the necessity for clinicians to meticulously evaluate the clinical benefits associated with the use of vasoactive drugs when managing critically ill cardiovascular patients with comorbid DM. By doing so, they can circumvent the heightened risk of mortality that may be linked to the excessive or prolonged administration of such medications.
GLU levels also play a crucial and substantial role. Glycemic management constitutes an integral component of ICU patient care, and excessively high blood glucose levels in the model can lead to poor outcomes. Critically ill patients are frequently susceptible to stress-induced hyperglycemia. This condition is associated with a notably high mortality rate, potentially accounting for the elevated risk of in-hospital mortality observed in the model when GLU levels are higher.36,37 Furthermore, elevated blood glucose levels may require physicians to use more potent glucose-lowering drugs, which in turn can lead to significant fluctuations in blood glucose levels. Therefore, it is routinely recommended that blood glucose levels in ICU patients be controlled within 140–180 mg/dL. 38 Studies have shown that glycemic fluctuations in ICU patients with cardiovascular disease are significantly positively correlated with both short-term and long-term outcomes.3,39,40 This indicates that clinicians are not merely required to keep their patients’ GLU levels from soaring too high but should also implement gentle glucose-lowering strategies to ensure that glycemic fluctuations remain within a narrow range. An elevated heart rate has been identified as a contributing factor to the increased risk of in-hospital mortality within the model. 41 It has been shown that in patients with heart failure and myocardial infarction, an increased heart rate is an independent risk factor for a heightened risk of death. This is because patients with severe cardiovascular disease often experience myocardial injury, and the rapid increase in myocardial oxygen consumption, this accelerates the deterioration of cardiac function and raises the risk of death.42,43 Similarly, the risk factors identified in our study, such as decreased oxygen saturation, advanced age, and a higher respiratory rate, all of which can elevate the risk of in-hospital death in patients, are consistent with previous research. This underscores the reliability of our results.44–48
The limitations of this study should not be overlooked. First, this study is inherently a retrospective clinical study, which inevitably carries the inherent limitations of retrospective data, such as inability to establish causal relationships, difficulty in controlling for confounding factors and so on. Secondly, although we used two independent public databases for internal and external validation to improve the reliability of our findings, both databases were drawn from US populations, and no data from Chinese or other populations were used for validation. This limits the generalizability of our results. In future research, we plan to integrate multicenter ICU data from Asian and broader ethnic groups to further validate the broad applicability of our model across different patient populations. Third, our model was developed and validated using historical ICU cohort data from MIMIC-IV (2014–2019) and eICU-CRD (2014–2015). Over the past decade, ICU clinical practices and diabetes management have undergone significant evolution. Such temporal changes in ICU practice may affect model performance when applied to contemporary ICU populations. As this study lacked access to the latest versions of these databases, we were unable to conduct formal time-series validation, recalibration, or retraining on new cohorts. Future research should validate the model using more contemporary datasets and retrain it if necessary to update the model. Fourth, the web-based calculator proposed in this study is currently only a prototype version developed based on retrospective models. It has not undergone prospective validation, usability testing, or workflow integration assessment, and its clinical applicability as a decision support tool remains to be determined. Future work should include end-user usability testing based on prospective multicenter validation, as well as safety and implementation assessments prior to routine clinical adoption. Finally, the absolute consistency in laboratory indicator testing and disease diagnosis across internal and external validation datasets cannot be guaranteed, which may introduce potential experimental bias into our study results.
Conclusion
This study developed a machine learning prediction model based on the LR algorithm to provide personalized in-hospital mortality risk prediction for patients with both cardiovascular disease and DM. The model demonstrated excellent predictive performance in both internal and external validation. Building on this, we further developed a web-based scoring calculator system to provide these patients with personalized early risk assessments, helping clinicians to identify high-risk groups early and take timely intervention measures, potentially improving in-hospital survival and clinical outcomes.
Supplemental material
Supplemental material - Machine Learning–Based risk stratification for in-hospital mortality in ICU patients with cardiovascular diseases and diabetes
Supplemental material for Machine Learning–Based risk stratification for in-hospital mortality in ICU patients with cardiovascular diseases and diabetes by Huabin He, Yanze Wu, Ruyi Tao, Huijian Wang, Huangxin Zhu, Qingyun Yu and Qingan Fu in Digital Health.
Supplemental material
Supplemental material - Machine Learning–Based risk stratification for in-hospital mortality in ICU patients with cardiovascular diseases and diabetes
Supplemental material for Machine Learning–Based risk stratification for in-hospital mortality in ICU patients with cardiovascular diseases and diabetes by Huabin He, Yanze Wu, Ruyi Tao, Huijian Wang, Huangxin Zhu, Qingyun Yu and Qingan Fu in Digital Health.
Footnotes
Acknowledgements
We are grateful to the participants in the Medical Information Mart for Intensive Care-IV database and eICU Collaborative Research Database. We also thank Biorender for help with the drawings.
Ethical considerations
This study was a secondary analysis of two publicly available, de-identified databases (MIMIC-IV and eICU-CRD). MIMIC-IV was approved by the Institutional Review Boards of the Massachusetts Institute of Technology (No. 0403000206) and Beth Israel Deaconess Medical Center (2001-P-001699/14). Because all data were de-identified, the requirement for informed consent was waived, and no additional ethics approval was required for the present analysis. The eICU-CRD dataset was de-identified and released under the HIPAA Safe Harbor provisions (Certification No. 1031219-2); therefore, no further ethical clearance was required for this study.
Consent for publication
All authors have consented to the publication of the paper.
Author contributions
Huabin He and Yanze Wu conceptualized and designed this study. Ruyi Tao and Huijian Wang performed the data extraction and initial analysis. Huangxin Zhu assisted in the data cleaning, data proofreading. Qingyun Yu prepared the initial manuscript draft. Qingan Fu participated in the critical revision of the manuscript and supervised the study. All the authors participated in editing, reviewing, and approving the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Jiangxi Province 03 Special Project & 5G Project (20232ABC03A22).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets generated from other sources are available in the MIMIC and the eICU repositories (https://mimic.physionet.org/,
). More information about the data can be obtained by contacting the corresponding author.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
