An interpretable machine learning model for predicting acute respiratory distress syndrome in critically ill patients with acute pancreatitis: A multicenter retrospective study

Abstract

Objective

Acute respiratory distress syndrome (ARDS) drives early mortality in severe acute pancreatitis (AP). Since conventional tools often fail to capture complex physiological interactions, we aimed to develop and validate an interpretable machine learning (ML) model for early ARDS prediction and deploy it as a web-based calculator.

Methods

This multicenter retrospective study utilized data from the MIMIC-IV database for model development and internal validation, and an independent cohort from Changshu Hospital for external validation. Optimal predictors were identified through a hybrid feature selection strategy combining LASSO regression and the Boruta algorithm. Seven ML algorithms were constructed, including random forest (RF), extreme gradient boosting, support vector machine, logistic regression, light gradient boosting machine, k-nearest neighbors, and decision trees. Model performance was evaluated by discrimination (AUC), calibration curves, and clinical utility (DCA). Model interpretability was assessed using SHapley Additive exPlanations (SHAP) and partial dependence plots (PDP).

Results

A total of 905 patients from the MIMIC-IV cohort (25.0% ARDS incidence) and 126 from the external cohort (20.6% incidence) were included. Nine independent predictors were identified: body mass index (BMI), respiratory rate, temperature, SOFA score, white blood cell count, PO₂, PCO₂, mechanical ventilation, and antibiotic use. The RF model demonstrated best performance (internal AUC 0.851) and maintained robust generalization in the external cohort (AUC 0.823). Calibration curves indicated good agreement between predicted and observed probabilities, and DCA showed superior net benefit across clinically relevant thresholds. SHAP analysis identified ventilation, SOFA score, BMI, PO₂, and respiratory rate as the most influential predictors.

Conclusion

A high-performing, interpretable RF model was developed for early ARDS prediction in critically ill AP patients. The model effectively captured complex physiological interactions and demonstrated robustness across diverse populations. By integrating this algorithmic framework into a user-friendly web calculator, the tool supports personalized risk stratification and timely clinical decision-making.

Keywords

acute pancreatitis acute respiratory distress syndrome machine learning random forest interpretability external validation

Introduction

As a widespread gastrointestinal condition, acute pancreatitis (AP) has seen an upward trend in incidence, leading to substantial medical and economic costs.^1,2 Although most individuals experience only localized and self-resolving inflammation, approximately 20% develop severe manifestations characterized by persistent systemic inflammatory response syndrome (SIRS) and multi-organ functional decline.^3,4 Among these systemic complications, acute respiratory distress syndrome (ARDS) is one of the most frequent and lethal manifestations of pulmonary dysfunction, affecting up to 30% of patients.⁵ It is a major contributor to early death, responsible for roughly 60% of deaths in patients with severe acute pancreatitis (SAP) during the first week.⁶ The onset of ARDS is often rapid, significantly prolonging intensive care unit (ICU) stays and increasing mortality risk.⁷ These observations underscore the urgent need for early and accurate prediction of ARDS in AP patients to guide timely clinical interventions and reduce morbidity and mortality in this high-risk population.

Various predictive tools have been evaluated to facilitate early risk identification. Single clinical and laboratory parameters, such as white blood cell (WBC) counts, platelets, lactate dehydrogenase, creatinine, albumin (ALB), and triglycerides, were investigated for their prognostic value.^8,9 However, these markers are often non-specific and susceptible to confounding by concurrent infections or other inflammatory conditions, limiting their individual predictive accuracy. Similarly, traditional clinical scoring systems were employed, including organ dysfunction metrics such as the Sequential Organ Failure Assessment (SOFA) and qSOFA, physiological assessments like the Acute Physiology and Chronic Health Evaluation II (APACHE-II), and AP-specific tools such as the Ranson criteria and the Bedside Index for Severity in Acute Pancreatitis (BISAP).^5,9,10 Yet, these tools are often limited by delayed assessment windows, operational complexity, and suboptimal predictive accuracy or specificity for pulmonary complications.^8,11 Additionally, conventional statistical models based on logistic regression (LR) and Cox proportional hazards models have been developed to integrate multiple risk factors for ARDS in AP contexts.^8,9 Despite offering improvements over single indicators, these linear models frequently lack the necessary sophistication to decipher the elaborate, high-dimensional, and non-linear dynamics that are fundamentally inherent in the pathophysiology of critical illness. This limitation potentially leads to suboptimal predictive performance.¹²

Machine learning (ML) has established itself as a groundbreaking methodology in the sphere of acute medicine. By analyzing large sets of clinical information, ML algorithms are able to identify non-linear and intricate links between health markers and patient outcomes that traditional models often miss.^13,14 Although several studies have explored ML for predicting AP complications, significant gaps hinder their translation into clinical practice. Many existing models are derived from single-center datasets, which limits their generalizability across diverse populations.¹⁵ Moreover, the “black box” nature of complex algorithms often covers up how predictions are made, reducing clinician trust.¹⁶ Crucially, few studies have effectively bridged the gap between theoretical algorithms and practical bedside application.

The main goal of this research is to leverage ML to pinpoint clinical factors that predict the occurrence of ARDS in AP patients. By constructing a robust prediction model using the MIMIC-IV database and external validation, and further developing an online web calculator, we aim to achieve accurate early prediction to guide timely interventions and achieve better recovery results for these vulnerable patients.

Methods

Data source

The training and internal validation datasets were extracted from the Medical Information Mart for Intensive Care IV version 3.0 (MIMIC-IV v3.0) database, which contains extensive and well-documented data from 65366 patients admitted to the ICU at Beth Israel Deaconess Medical Center (Boston, MA, USA) between 2008 and 2022.¹⁷ Author YZ (Record ID: 60227322) was granted access to the database after successfully completing the required Collaborative Institutional Training Initiative (CITI) examination. The establishment of the MIMIC-IV database received ethical clearance from the Institutional Review Boards of both the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center. Due to the retrospective and anonymized nature of the data, the requirement for informed consent was waived. In addition, patients with acute pancreatitis admitted to Changshu Hospital affiliated to Soochow University between January 2019 and March 2024 were included as an external validation cohort. The study protocol was reviewed and approved by the Ethics Committee of Changshu Hospital affiliated to Soochow University (Approval No. L2024026). Due to the retrospective design and the use of routine clinical data, the ethics committee waived the need for written informed consent. All study procedures adhered strictly to the ethical principles outlined in the Declaration of Helsinki.

Participants

In the MIMIC-IV database, adult patients (aged ≥ 18 years) with AP were identified using the International Classification of Diseases, 9th Revision (ICD-9, code 577.0) and 10th Revision (ICD-10, code K85%). For patients with multiple ICU admissions, only data from the first admission were included. For the external validation cohort, the diagnosis of AP was established in strict accordance with the Guidelines for diagnosis and treatment of acute pancreatitis in China (2021).¹⁸ The primary outcome was the development of ARDS during hospitalization, which was defined according to the Berlin Definition.¹⁹ To ensure consistency between the cohorts, the following exclusion criteria were applied: (1) age < 18 years; (2) diagnosis of ARDS prior to or within the first 24 hours of ICU admission; (3) length of ICU stay < 24 hours. This temporal exclusion prevents data leakage and ensures the model functions as an early prognostic warning system rather than a concurrent diagnostic tool, preserving a genuine preemptive window for clinical intervention.

The eligible patients from the MIMIC-IV database were randomly partitioned into a training set and an internal validation set at a ratio of 7:3. To ensure the comparability of these subgroups, a baseline characteristics analysis was conducted to confirm that the random split did not introduce distribution bias between the training and internal validation sets. The training set was utilized for variable selection and model construction, while both the internal and external validation sets were employed for model evaluation. The overall workflow of this study is illustrated in Figure 1.

Figure 1.

Flowchart of patient enrollment, model construction, and web-based calculator deployment. MIMIC: Medical Information Mart for Intensive Care Unit; ICU: intensive care unit; RF, random forest; XGBoost, extreme gradient boosting; LightGBM, light gradient boosting machine; DT, decision tree; LR, logistic regression; SVM, support vector machine; KNN, k-nearest neighbors.

Data extraction and processing

Data were extracted from the MIMIC-IV database using Structured Query Language (SQL) via pgAdmin 4 software (version 6.21). All clinical and laboratory variables were collected within the first 24 hours of ICU admission. For variables with multiple recordings during this period, only the initial measurement was included. The extracted variables were categorized as follows: (1) Demographic information: age (years), gender, race (Asian, Black, White, Hispanic, other), height (cm), weight (kg), body mass index (BMI, kg/m²), and insurance (Medicare or other); (2) Vital signs: heart rate (times/min), respiratory rate (times/min), systolic blood pressure (SBP, mmHg), diastolic blood pressure (DBP, mmHg), temperature (°C), and oxygen saturation (SpO₂, %); (3) Severity scores: Glasgow Coma Scale (GCS), SOFA score, and simplified acute physiology score II (SAPS II); (4) Laboratory data: white blood cell count (WBC, K/μL), hemoglobin (g/dL), platelet (PLT, K/μL), C-reactive protein (CRP, mg/dL), alanine aminotransferase (ALT, U/L), aspartate aminotransferase (AST, U/L), albumin (ALB, g/dL), creatinine (mg/dL), sodium (mEq/L), potassium (mEq/L), chloride (mEq/L), calcium (mg/dL), blood urea nitrogen (BUN, mg/dL), hematocrit (%), glucose (mg/dL), international normalized ratio (INR), prothrombin time (PT, seconds), partial thromboplastin time (PTT, seconds), pH, partial pressure of oxygen (PO₂, mmHg), partial pressure of carbon dioxide (PCO₂, mmHg), base excess (BE, mEq/L), lactic acid (Lac, mEq/L), and bicarbonate (HCO₃^-, mEq/L); (5) Comorbidities: hypertension, diabetes, myocardial infarction, chronic obstructive pulmonary disease (COPD), asthma, acute kidney injury (AKI), sepsis, venous thromboembolism (VTE), and malignant cancer; and (6) Therapeutic interventions: central venous catheterization (CVC), cardiopulmonary bypass (CPB), continuous renal replacement therapy (CRRT), ventilation, heparin, aspirin, antibiotic, and vasopressors use. The same set of variables was extracted from the external validation cohort at Changshu Hospital affiliated to Soochow University.

Regarding missing data, variables with more than 30% missing values were excluded to minimize potential bias. For variables with less than 5% missing data, mean substitution was used. For variables with 5% to 30% missing data, multiple imputation was performed. Outliers were identified and treated as missing values, which were then handled using the imputation methods described above. Following data completion, multicollinearity among the candidate variables was assessed using Spearman’s rank correlation coefficients. A correlation matrix was constructed and visualized via a heatmap (Figure 2), where variables exhibiting a correlation coefficient exceeding 0.8 were excluded to minimize redundancy.

Figure 2.

Spearman’s rank correlation heatmap of candidate clinical variables. BMI: body mass index; HR: heart rate; RR: respiratory rate; SBP: systolic blood pressure; DBP: diastolic blood pressure; SpO2: oxygen saturation; GCS: Glasgow Coma Scale; SOFA: Sequential Organ Failure Assessment; SAPS II: simplified acute physiology score II; WBC: white blood cell count; PLT: platelet; ALT: alanine aminotransferase; BUN: blood urea nitrogen; INR: international normalized ratio; PT: prothrombin time; PTT: partial thromboplastin time; pH: potential of hydrogen; PO2: partial pressure of oxygen; PCO2: partial pressure of carbon dioxide; BE: base excess; Lac: lactic acid; HCO3-: bicarbonate; COPD: chronic obstructive pulmonary disease; ARF: acute respiratory failure; AKI: acute kidney injury; VTE: venous thromboembolism; CVC: central venous catheterization; CPB: cardiopulmonary bypass; CRRT: continuous renal replacement therapy.

Feature selection

To identify robust predictors while minimizing overfitting, a hybrid feature selection strategy was implemented.²⁰ The least absolute shrinkage and selection operator (LASSO) regression was initially applied to address multicollinearity and eliminate redundant variables by shrinking their coefficients to zero. The optimal regularization parameter (λ) was determined through 10-fold cross-validation based on minimum binomial deviance. Subsequently, to capture potential non-linear associations, the Boruta algorithm was employed. As a wrapper method based on the random forest (RF) classifier, Boruta confirms relevant features by comparing their importance Z-scores against those of randomized “shadow attributes.” Ultimately, to ensure model parsimony and clinical interpretability, a consensus strategy was adopted, whereby only variables independently identified by both LASSO and Boruta were retained for final model development.

Model development and evaluation

Based on the selected feature subsets, seven ML algorithms were constructed to predict the onset of ARDS: extreme gradient boosting (XGBoost), RF, support vector machine (SVM), logistic regression (LR), light gradient boosting machine (LightGBM), k-nearest neighbors (KNN), and decision trees (DT). To ensure optimal model performance and mitigate overfitting, hyperparameter tuning was conducted using a grid search strategy combined with five-fold cross-validation on the training dataset. The discriminatory power of each model was rigorously evaluated in the internal validation set and subsequently tested in the independent external validation cohort. Performance was quantified using the area under the receiver operating characteristic curve (AUC), alongside sensitivity, specificity, accuracy, precision, F1-score, and the Brier score. The F1 score was calculated as follows: 2 × precision × recall/(precision + recall). The Brier score is derived from the squared difference between the observed and predicted outcomes. It combines the aspects of discrimination and calibration, with lower scores indicating higher accuracy. A score greater than 0.25 is generally considered to be indicative of a worthless prediction. Furthermore, the clinical utility and reliability of the model were assessed using calibration curves to examine the agreement between predicted and observed probabilities, and decision curve analysis (DCA) to determine the net clinical benefit across a range of threshold probabilities. To address potential methodological circularity associated with mechanical ventilation, an additional sensitivity analysis was conducted. We performed an ablation study by removing the ventilation feature and retraining the optimal Random Forest model using the remaining eight variables. The discriminative performance of this reduced model was evaluated using AUC across all datasets to confirm the independent predictive value of the physiological markers.

Model interpretability and deployment strategy

To address the inherent “black-box” nature of machine learning and enhance clinical transparency, the SHapley Additive exPlanations (SHAP) method was employed to elucidate the model’s decision-making process. A multi-dimensional visualization strategy was adopted to interpret predictions from both global and local perspectives. Global feature importance was primarily quantified using the SHAP summary bar plot, while the bee swarm plot was concurrently utilized to visualize the directional impact and distribution of each feature’s contribution to ARDS risk. Partial Dependence Plots (PDP) were computed to visualize the marginal effects of key continuous variables, thereby capturing potential non-linear associations and clinically relevant threshold effects. For individual-level interpretation, SHAP force plots were constructed to decompose the specific risk factors driving a single prediction, demonstrating how distinct features shift the model’s output from the baseline. Ultimately, to facilitate the translation of this complex algorithm into bedside practice, the optimal model was deployed as a user-friendly, interactive web-based calculator using the Streamlit framework (https://streamlit.io).

Statistical analysis

Continuous variables were assessed for normality using the Kolmogorov-Smirnov test. Normally distributed data were expressed as mean ± standard deviation (SD) and compared using the Student's t-test, while non-normally distributed data were presented as median [interquartile range (IQR)] and compared using the Mann-Whitney U test. Categorical variables were reported as frequencies (percentages) and analyzed using the Chi-square test or Fisher’s exact test.

All statistical analyses in the current study were completed using R software (Version 4.1.2), Stata software (Version 16.0), and Python software (Version 3.13). Two-sided P-values <0.05 were considered statistically significant.

Results

Baseline characteristics

The study population consisted of 905 patients from the MIMIC-IV group along with 126 individuals in the external validation cohort. Among them, 226 out of 905 patients (25.0%) in the MIMIC-IV cohort and 26 out of 126 patients (20.6%) in the external cohort developed ARDS. The baseline characteristics of all patients are summarized in Table 1. In the MIMIC-IV cohort, patients who developed ARDS exhibited significantly higher BMI, heart rates, respiratory rates, and body temperature compared to those without ARDS (all P < 0.001). Laboratory parameters revealed higher WBC counts, creatinine, glucose, PTT, and PCO₂ levels, as well as lower pH values in the ARDS group (all P < 0.05). Additionally, ARDS patients had significantly higher illness severity scores, as indicated by elevated SOFA and SAPS II scores (both P < 0.001). Comorbidities such as AKI and sepsis were significantly more common in the ARDS group (both P < 0.001). Furthermore, these patients were more likely to have undergone therapeutic interventions, including mechanical ventilation, CRRT, and the use of antibiotics and vasopressors (all P < 0.001). In the external validation cohort, similar patterns were observed regarding disease severity and specific biomarkers. ARDS patients demonstrated significantly higher BMI (P = 0.009), WBC counts (P = 0.010), and creatinine levels (P = 0.011). Consistent with the training set, patients in the external cohort who developed ARDS presented with higher SOFA and SAPS II scores (both P < 0.001) and had a higher prevalence of sepsis (P = 0.026). These findings highlight significant differences in metabolic status, inflammatory markers, and clinical severity between patients with and without ARDS across both cohorts, emphasizing the relevance of these factors in early risk prediction.

Table 1.

Baseline characteristics of the MIMIC-IV and external validation cohorts.

Variables	MIMIC-IV cohort (n=905)			External validation cohort (n=126)
Variables	Non-ARDS (n=679)	ARDS (n=226)	P Value	Non-ARDS (n=100)	ARDS (n=26)	P Value
Demographic data
Age, median (Q1, Q3)	60 (46, 74)	57 (46, 68)	0.053	52 (40, 66)	44 (34, 56)	0.111
Gender (male), n (%)	385 (57)	134 (59)	0.545	50 (50.0)	15 (58)	0.632
BMI, median (Q1, Q3)	26.24 (23.66, 30.66)	30.36 (25.31, 34.8)	< 0.001	20.91 (19.38, 24.10)	23.50 (21.06, 27.53)	0.009
Race, n (%)			0.274	-	-	-
Asian	23 (3)	7 (3)
Black	64 (9)	15 (7)
White	433 (64)	141 (62)
Hispanic	27 (4)	6 (3)
Other	132 (19)	57 (25)
Insurance, n (%)			0.620			0.412
Other	170 (25)	61 (27)		65 (65.0)	14 (54)
Medicare	509 (75)	165 (73)		35 (35.0)	12 (46)
Vital signs
HR, median (Q1, Q3)	99 (84, 113)	106 (90, 119)	0.001	100 (80, 116)	109 (93, 124)	0.088
RR, median (Q1, Q3)	20 (17, 24)	23 (18, 28)	<.001	19 (15, 24)	25 (17, 28)	0.350
SBP, mean ± SD	127.44 ± 24.64	122.08 ± 28.35	0.012	127.34 ± 25.06	120.04 ± 23.05	0.165
DBP, mean ± SD	73.38 ± 18.62	70.97 ± 19.41	0.561	71.23 ± 17.96	67.77 ± 16.76	0.361
Temperature, median (Q1, Q3)	36.9 (36.6, 37.2)	37.1 (36.6, 37.6)	< 0.001	36.8 (36.6, 37.2)	37.0 (36.5, 37.4)	0.346
SpO₂, median (Q1, Q3)	97 (94, 99)	96 (94, 99)	0.065	96 (94, 98)	95 (92, 98)	0.201
Severity scoring system
GCS, median (Q1, Q3)	15 (14, 15)	15 (14, 15)	0.567	15 (14, 15)	15 (14, 15)	0.075
SOFA, median (Q1, Q3)	5 (2, 8)	8 (5, 11)	< 0.001	4 (2, 7)	9 (5, 13)	< 0.001
SAPS II, median (Q1, Q3)	33 (23, 44)	43 (32, 58)	< 0.001	32 (23, 42)	48 (38, 62)	< 0.001
Laboratory data
WBC, median (Q1, Q3)	12.2 (7.9, 17.05)	14.85 (10.1, 21.37)	< 0.001	10.6 (7.3, 16.6)	12.9 (10.1, 22.4)	0.010
Hemoglobin, mean ± SD	11.00 ± 2.38	10.89 ± 2.52	0.561	11.45 ± 2.10	11.20 ± 2.32	0.622
PLT, median (Q1, Q3)	185 (126.5, 261.5)	194.5 (130.75, 271.5)	0.276	174 (132, 234)	205 (137, 225)	0.423
ALT, median (Q1, Q3)	47 (23, 122.5)	40 (23, 106)	0.272	84 (34, 147)	45 (24, 130)	0.238
Creatinine, median (Q1, Q3)	1.0 (0.7, 1.7)	1.2 (0.8, 2.1)	0.002	1.0 (0.7, 1.5)	1.7 (1.0, 2.6)	0.011
Sodium, median (Q1, Q3)	138 (134, 141)	138 (135, 142)	0.173	138 (135, 141)	139 (137, 143)	0.095
Potassium, median (Q1, Q3)	4.0 (3.6, 4.4)	4.05 (3.7, 4.6)	0.014	4.0 (3.6, 4.4)	4.0 (3.5, 4.6)	0.969
Chloride, median (Q1, Q3)	103 (99, 108)	104 (100, 109)	0.035	104 (100, 108)	108 (102, 112)	0.028
BUN, median (Q1, Q3)	18 (11, 34)	24 (16, 42)	< 0.001	27 (13, 42)	31 (20, 40)	0.442
Glucose, median (Q1, Q3)	126 (100, 162)	139 (106, 181.75)	0.002	133 (105, 165)	128 (107, 197)	0.986
INR, median (Q1, Q3)	1.3 (1.1, 1.53)	1.3 (1.2, 1.6)	0.035	1.2 (1.1, 1.4)	1.4 (1.2, 1.6)	0.026
PTT, median (Q1, Q3)	30.6 (27.2, 37.2)	32.2 (27.83, 39.9)	0.009	35.1 (25.7, 47.1)	35.5 (26.9, 44.9)	0.902
pH, median (Q1, Q3)	7.37 (7.3, 7.42)	7.33 (7.24, 7.41)	< 0.001	7.38 (7.31, 7.43)	7.33 (7.21, 7.38)	0.027
PO₂, median (Q1, Q3)	103 (78, 164)	90 (75, 117)	< 0.001	93 (76, 126)	112 (88, 165)	0.129
PCO₂, median (Q1, Q3)	39 (34, 45)	42 (35, 49)	< 0.001	38 (34, 42)	40 (32, 49)	0.503
BE, median (Q1, Q3)	-2 (-6, 1)	-3 (-8, 0)	0.062	-2 (-6, 0)	-3 (-10, 0)	0.197
Lac, median (Q1, Q3)	1.8 (1.2, 2.7)	1.8 (1.2, 3.1)	0.468	1.6 (1.2, 2.2)	2.3 (1.3, 3.3)	0.094
HCO₃^-, median (Q1, Q3)	21 (19, 25)	22 (18, 25)	0.437	22 (18, 25)	19 (16, 23)	0.177
Comorbidities, n (%)
Hypertension	320 (47)	102 (45)	0.657	41 (41)	10 (38)	0.991
Diabetes	86 (13)	27 (12)	0.867	12 (12)	1 (4)	0.300
Myocardial infarction	12 (2)	0 (0)	0.045	1 (1)	0 (0)	1.000
COPD	25 (4)	10 (4)	0.762	3 (3)	2 (8)	0.274
Asthma	58 (9)	20 (9)	0.995	6 (6.0)	5 (19.2)	0.033
AKI	409 (60)	184 (81)	< 0.001	58 (58)	17 (65)	0.646
Sepsis	423 (62)	196 (87)	< 0.001	35 (35)	16 (62)	0.026
VTE	41 (6)	20 (9)	0.191	5 (5.0)	2 (7.7)	0.593
Malignant cancer	67 (10)	21 (9)	0.902	4 (4.0)	4 (15.4)	0.034
Therapeutic interventions, n (%)
Central venous catheterization	83 (12)	33 (15)	0.417	25 (25.0)	8 (30.8)	0.551
CPB	9 (1)	4 (2)	0.747	1 (1)	0 (0)	1.000
CRRT	81 (12)	71 (31)	< 0.001	32 (32)	15 (58)	0.029
Ventilation	178 (26)	141 (62)	< 0.001	32 (32)	11 (42)	0.450
Heparin	630 (93)	218 (96)	0.070	93 (93)	24 (92)	1.000
Aspirin	216 (32)	72 (32)	1.000	28 (28)	9 (35)	0.676
Antibiotic	582 (86)	225 (100)	< 0.001	79 (79)	26 (100)	0.007
Vasopressors	141 (21)	107 (47)	< 0.001	20 (20)	15 (58)	< 0.001

MIMIC: Medical Information Mart for Intensive Care Unit; ARDS: acute respiratory distress syndrome; BMI: body mass index; HR: heart rate; RR: respiratory rate; SBP: systolic blood pressure; DBP: diastolic blood pressure; SpO₂: oxygen saturation; GCS: Glasgow Coma Scale; SOFA: Sequential Organ Failure Assessment; SAPS II: simplified acute physiology score II; WBC: white blood cell count; PLT: platelet; ALT: alanine aminotransferase; BUN: blood urea nitrogen; INR: international normalized ratio; PTT: partial thromboplastin time; pH: potential of hydrogen; PO₂: partial pressure of oxygen; PCO₂: partial pressure of carbon dioxide; BE: base excess; Lac: lactic acid; HCO₃^-: bicarbonate; COPD: chronic obstructive pulmonary disease; ARF: acute respiratory failure; AKI: acute kidney injury; VTE: venous thromboembolism; CPB: cardiopulmonary bypass; CRRT: continuous renal replacement therapy.

Additionally, the comparison of clinical parameters between the randomly assigned training and internal validation sets is presented in Supplemental Table S1. The vast majority of demographic and clinical variables showed no statistically significant differences between the two subsets. Although a few laboratory parameters, including hemoglobin, ALT, potassium, BUN, and PO₂, exhibited P-values < 0.05 as statistically expected in multiple comparisons, their absolute numerical differences were clinically negligible. This confirms the overall high comparability between the training and internal validation cohorts, providing a robust dataset for model training.

Key variables

Before model construction, C-reactive protein, aspartate aminotransferase, albumin, and calcium were excluded due to a missing rate exceeding 30%. Subsequently, Spearman’s rank correlation analysis was performed to assess multicollinearity. Hematocrit and prothrombin time were excluded as they exhibited correlation coefficients greater than 0.8 with hemoglobin and international normalized ratio, respectively, to prevent redundancy.

The remaining variables were subjected to the hybrid feature selection process. The LASSO regression analysis, using 10-fold cross-validation to minimize the binomial deviance, identified nine variables with non-zero coefficients: BMI, RR, temperature, SOFA score, WBC, PO₂, PCO₂, ventilation, and antibiotic use (Figure 3(a) and (b)). Concurrently, the Boruta algorithm identified 22 confirmed features deemed relevant for prediction, including age, BMI, RR, SBP, temperature, SpO₂, GCS, SOFA score, SAPS II, WBC, creatinine, BUN, pH, PO₂, PCO₂, lactate, HCO₃^-, sepsis, CRRT, ventilation, antibiotic use, and vasopressin (Figure 3(c)).

Figure 3.

Feature selection process using LASSO regression and the Boruta algorithm. (A) Coefficient profile plotted against the logarithm of the lambda sequence. (B) Cross-validation plot for determining the optimal penalty term. (C) Importance ranking of features identified by the Boruta algorithm. CPB: cardiopulmonary bypass; CVC: central venous catheterization; INR: international normalized ratio; AKI: acute kidney injury; HR: heart rate; DBP: diastolic blood pressure; SBP: systolic blood pressure; HCO3-: bicarbonate; CRRT: continuous renal replacement therapy; RR: respiratory rate; GCS: Glasgow Coma Scale; pH: potential of hydrogen; BMI: body mass index; SOFA: Sequential Organ Failure Assessment.

Based on the consensus strategy, the intersection of variables selected by both algorithms was retained. Consequently, nine key variables including BMI, RR, temperature, SOFA score, WBC, PO₂, PCO₂, ventilation, and antibiotic use were ultimately determined as the predictors for model development.

Model development and validation

Based on the nine key predictors identified, seven ML algorithms were constructed and optimized. Initial evaluation in the training cohort revealed that the RF model exhibited superior learning capabilities, achieving the highest AUC among all classifiers (Figure 4(a)). Internal validation based on ROC analysis further showed that the RF model demonstrated the best predictive performance, achieving an AUC of 0.851 (95% CI: 0.803-0.899), followed by LightGBM (AUC = 0.840), SVM (AUC = 0.837), LR (AUC = 0.833), XGBoost (AUC = 0.833), KNN (AUC = 0.807), and DT (AUC = 0.768) (Figure 4(b)). Detailed performance metrics for each model are presented in Table 2. While some models excelled in specific metrics, the RF model exhibited the most balanced and robust performance profile across all evaluation indicators, maintaining high specificity (0.887) and accuracy (0.807) alongside its superior discriminatory power.

Figure 4.

Receiver operator characteristic (ROC) curves of the test and validation sets of 7 machine learning models and 2 traditional clinical scoring systems. (A) ROC curves of the training set. (B) ROC curves of the internal validation set. (C) ROC curves of the external validation set. AUC, area under the ROC curve; XGBoost, extreme gradient boosting; LightGBM, light gradient boosting machine; SVM, support vector machine; KNN, k-nearest neighbors; SOFA, Sequential Organ Failure Assessment; SPAS II, simplified acute physiology score II.

Table 2.

Comprehensive evaluation of machine learning model performance in the internal validation set.

Methods	Sensitivity	Specificity	Accuracy	Precision	F1	Brier score	AUC	AUC (95% CI)
RF	0.567	0.887	0.807	0.623	0.594	0.144	0.851	0.803-0.899
XGBoost	0.224	0.975	0.789	0.750	0.345	0.141	0.833	0.781-0.886
LightGBM	0.269	0.956	0.785	0.667	0.383	0.139	0.840	0.786-0.891
DT	0.313	0.916	0.767	0.553	0.400	0.151	0.768	0.700-0.834
LR	0.701	0.768	0.752	0.500	0.584	0.166	0.833	0.779-0.882
SVM	0.358	0.951	0.804	0.706	0.475	0.138	0.837	0.786-0.885
KNN	0.313	0.951	0.793	0.677	0.429	0.143	0.807	0.751-0.867

AUC, area under the ROC curve; XGBoost, extreme gradient boosting; RF, random forest; SVM, support vector machine; LR, logistic regression; LightGBM, light gradient boosting machine; KNN, k-nearest neighbors; DT, decision tree.

External validation using the independent cohort from Changshu Hospital further confirmed the robustness and generalizability of the RF model (Supplemental Table S2). It achieved a satisfactory AUC of 0.823 (95% CI: 0.735-0.899), maintaining stable discrimination even in a distinct population (Figure 4(c)). To benchmark the clinical utility of our proposed framework, we compared the predictive performance of the optimal RF model against established clinical severity scoring systems, specifically the SOFA and SAPS II scores. As illustrated in the Figure 4, the RF model significantly outperformed these traditional metrics across all cohorts. In the internal validation set, the RF model’s AUC of 0.851 was substantially higher than that of the SOFA score (AUC = 0.725) and SAPS II score (AUC = 0.710). Similarly, in the external validation cohort, the RF model maintained a distinct advantage (AUC = 0.823) compared to the SOFA score (AUC = 0.713) and SAPS II score (AUC = 0.772). The consistent performance observed across both internal and external validation cohorts underscores the strong clinical potential of the RF model for the early prediction of ARDS in AP patients.

Furthermore, the calibration curve of the RF model demonstrated good agreement between the predicted probabilities and the actual observed ARDS risks (Figure 5(a)), indicating high reliability. Decision curve analysis further confirmed the clinical utility of the model, revealing that the RF model yielded a superior net benefit over a broad range of threshold probabilities in comparison with the default strategies of universal intervention or no intervention (Figure 5(b)).

Figure 5.

Calibration curve of the random forest (RF) model in the internal validation set (A). Decision curve analysis (DCA) of the RF model in the internal validation set (B). XGBoost, extreme gradient boosting; LightGBM, light gradient boosting machine; SVM, support vector machine; KNN, k-nearest neighbors.

A sensitivity analysis was performed to mitigate the circularity bias related to mechanical ventilation. After excluding this feature, the retrained eight-feature model maintained robust discrimination in the internal validation set with an AUC of 0.818. Notably, the AUC in the external validation cohort increased to 0.861, as shown in Supplemental Figure S1. These findings indicate that the model’s fundamental predictive power is driven by underlying physiological and laboratory derangements such as SOFA, BMI, and PO₂. The results also suggest that reliance on objective physiological metrics can enhance cross-institutional generalizability by reducing the influence of local clinical practices.

Interpretation of predictive features

The SHAP summary plot identified the relative importance of the predictors, with ventilation usage emerging as the primary factor, followed by SOFA score, BMI, PO2, RR, PCO2, WBC, temperature, and antibiotic use (Figure 6(a)). The bee swarm plot (Figure 6(b)) further elucidated the directionality of these effects, revealing that the requirement for mechanical ventilation, along with elevated values of SOFA, BMI, RR, and PCO2, contributed positively to the predicted risk of ARDS. Conversely, lower PO2 levels and the absence of mechanical ventilation were associated with a lower probability of the outcome.

Figure 6.

SHAP summary plot for clinical variables contributing to the random forest (RF) model. (a) Feature importance ranking plot based on the RF model. (b) Scatter plot of variables for SHAP analysis based on the RF model. SHAP: SHapley Additive exPlanations; SOFA: sequential organ failure assessment; BMI: body mass index; PO2: partial pressure of oxygen; RR: respiratory rate; PCO2, partial pressure of carbon dioxide; WBC: white blood cell.

To further quantify these associations, PDP were generated to visualize the marginal effect of the top six features (Figure 7). Analysis of mechanical ventilation usage indicated that it was associated with a substantial elevation in ARDS risk, rising from a baseline of approximately 16% to 43% (Figure 7(a)). The SOFA score demonstrated a strong correlation with ARDS risk, characterized by a steep upward trajectory between 5 and 13 points, after which the curve reached a plateau (Figure 7(b)). BMI exhibited a complex non-linear relationship with the outcome, where the predicted risk remained relatively baseline below 27 kg/m², followed by a sharp escalation. This risk then stabilized between 30 and 40 kg/m² before rising again at higher BMI values (Figure 7(c)). For oxygenation status, the model captured a precipitous rise in risk as PO₂ levels dropped below approximately 170 mmHg (Figure 7(d)). Analysis of respiratory parameters revealed that the predicted probability increased significantly when the RR surpassed 25 breaths/min (Figure 7(e)). Similarly, PCO₂ exhibited a predominantly monotonic increasing trend, with a more pronounced elevation in ARDS risk observed once PCO₂ exceeded 45 mmHg (Figure 7(f)). These patterns confirm that the RF model successfully captured critical physiological thresholds beyond simple linear correlations.

Figure 7.

Partial dependence plots (PDP) of the top six features based on the random forest (RF) model. SOFA: sequential organ failure assessment; BMI: body mass index; PO2: partial pressure of oxygen; RR: respiratory rate; PCO2, partial pressure of carbon dioxide.

On an individual level, SHAP force plots (Figure 8) further validated these patterns by quantifying the specific contribution of each feature to a patient’s risk score. For example, in a high-risk case (Figure 8(a)), the patient’s probability of developing ARDS was elevated to 0.50, primarily driven by a high SOFA score (11.0), the requirement for mechanical ventilation, and an elevated WBC (12.6). In contrast, for a lower-risk patient (Figure 8(b)), the predicted probability was suppressed to 0.06. Despite presenting with a SOFA score of 8.0 and a PO₂ of 85.0 mmHg, the overall risk remained low due to the combined protective effects of the absence of mechanical ventilation, a lower BMI (22.54), a normal RR (20.0) and PCO₂ of 36mmHg.

Figure 8.

SHAP force plots illustrating individual prediction explanations based on the random forest (RF) model. (a) A high-risk example. (b) A low-risk example. SHAP: SHapley Additive exPlanations; SOFA: sequential organ failure assessment; BMI: body mass index; PO2: partial pressure of oxygen; RR: respiratory rate; PCO2:partial pressure of carbon dioxide; WBC: white blood cell.

Development of the web-based calculator

Based on the validated predictors and the optimal RF model, an interactive web-based calculator was developed to bridge the gap between algorithmic complexity and clinical application (Figure 9). To ensure complete consistency with the underlying algorithm, the interface integrates all nine key predictors, including BMI, RR, temperature, SOFA score, WBC, PO₂, PCO₂, ventilation status, and antibiotic use. By inputting these parameters via a user-friendly interface, clinicians can instantly obtain the predicted probability of ARDS. This tool is freely accessible at a dedicated website (https://rf-model-6t6jrrgfn4fmdaesdceunt.streamlit.app/) to support rapid, personalized risk stratification in real-time clinical settings.

Figure 9.

An online web calculator based on random forest (RF) machine learning model; SOFA: sequential organ failure assessment; BMI: body mass index; PO2: partial pressure of oxygen; RR: respiratory rate; PCO2:partial pressure of carbon dioxide; WBC: white blood cell.

Discussion

In this multicenter retrospective study, we developed and validated a ML-based framework to predict the development of ARDS in critically ill patients with AP. Our results demonstrate that the RF model outperformed six other common algorithms, achieving a robust AUC of 0.851 in the internal validation set and maintaining an AUC of 0.823 in the independent external cohort. This consistent performance is notable, as many previously reported prediction models for pancreatitis-associated ARDS were derived from single-center cohorts and showed substantial performance degradation when applied to external populations.^7,11,21

Several recent studies have highlighted the challenges of generalizability in ARDS prediction, particularly in heterogeneous ICU populations where variations in disease severity, management strategies, and patient demographics can substantially influence model performance.^22–24 In this context, the preserved discrimination observed in our external cohort suggests that the proposed model may better accommodate population heterogeneity, supporting its potential applicability across different clinical settings. By integrating a hybrid feature selection strategy and multiple ML algorithms, we identified nine clinically accessible predictors. Importantly, rather than assuming linear or monotonic associations, we explored the functional relationships between key predictors and ARDS risk using SHAP and partial dependence analyses, allowing a more nuanced interpretation of selected variables.²⁵ A web-based risk prediction tool was subsequently implemented, allowing for instantaneous risk assessment and serving as a clinically actionable resource to guide timely interventions in patients at high risk. This strategy aligns with recent recommendations emphasizing that interpretability is essential for the clinical translation of ML models in critical care, particularly for high-stakes outcomes such as ARDS.^11,26

The SHAP-based interpretability analysis revealed that mechanical ventilation was the most influential predictor of ARDS development. Partial dependence analysis demonstrated a marked increase in predicted ARDS risk among patients requiring ventilatory support, with the estimated probability rising from approximately 16% to over 40%. This finding is consistent with prior observational studies reporting that invasive ventilation in AP often reflects advanced respiratory compromise and may exacerbate lung injury through ventilator-induced stress in already inflamed pulmonary tissue.^5,27 Our results do not imply a direct causal role of ventilation itself, but rather reinforce its value as a composite marker of disease severity and early lung vulnerability. The SOFA score was another dominant contributor to ARDS risk, demonstrated a steep increase in risk between SOFA scores of approximately 5 and 13. This observation parallels prior reports demonstrating a strong association between escalating SOFA scores and subsequent development of ARDS and multiple organ dysfunction in critically ill patients.^28,29 Notably, the non-linear risk gradient observed in this intermediate range suggests that patients may transition rapidly from compensated organ dysfunction to overt pulmonary failure, a phenomenon that may be underestimated by traditional linear risk models.

BMI emerged as an important predictor with a distinct non-linear risk profile. The predicted probability of ARDS increased sharply once BMI exceeded approximately 27 kg/m². This finding is concordant with existing literature linking obesity to impaired respiratory mechanics, reduced functional residual capacity, and a chronic pro-inflammatory milieu that predisposes patients to lung injury during systemic inflammatory states.³⁰ Our results extend these observations by identifying a clinically relevant BMI range at which ARDS risk begins to accelerate in patients with AP. Furthermore, respiratory and gas exchange parameters also demonstrated clinically meaningful threshold effects. The model identified a pronounced increase in ARDS risk when arterial oxygen tension fell below approximately 170 mmHg, respiratory rate exceeded 25 breaths per minute, and arterial carbon dioxide tension rose above 45 mmHg. These thresholds are broadly concordant with recent physiologic and critical care studies describing early deterioration in ventilatory reserve preceding overt hypoxemic respiratory failure.^7,19,31 While factors such as elevated BMI and impaired oxygenation are established general risks, our partial dependence analysis transforms these known concepts into objective, quantifiable clinical triggers. Identifying non-linear tipping points, such as the steep risk escalation at a BMI of 27 kg/m² or a PO₂ dropping below 170 mmHg, provides precise parameters that traditional linear scoring systems lack. These data-driven thresholds deliver actionable insights, enabling clinicians to identify impending respiratory failure and optimize the timing of preventive interventions before irreversible deterioration occurs.

When compared with conventional statistical models and clinical scoring systems, the present machine learning approach offers several advantages. Traditional tools such as APACHE II, Ranson, and BISAP rely on linear and additive assumptions and were not specifically designed to predict pulmonary complications in AP.^5,9 Prior comparative studies have shown that such scores demonstrate only modest discrimination for ARDS and frequently fail to capture higher-order interactions among physiologic variables.⁸ Our direct benchmark analysis confirms this limitation, as traditional metrics like SOFA and SAPS II achieved substantially lower discriminative power (AUCs ranging from 0.710 to 0.772) compared to our optimal RF model across both internal and external cohorts. In contrast, the RF algorithm can integrate weak but complementary predictors and model complex interactions without prespecified assumptions, which may explain the superior discrimination observed in our study.³² The model’s practical value was further validated by evaluating its clinical utility, which showed a superior net gain over a broad interval of risk thresholds relative to the default approaches of universal or no intervention. These findings are consistent with recent ML-based ICU studies showing that improved discrimination can translate into meaningful clinical benefit by supporting more selective escalation of monitoring and preventive interventions.^7,11 Such an approach may help optimize resource allocation while minimizing unnecessary interventions in low-risk patients.

One important benefit of this research involves its emphasis on clinical explainability and real-world application. The lack of transparency has been repeatedly cited as a major barrier to clinician acceptance of ML models in critical care.^11,33 By combining global feature importance rankings, partial dependence visualization, and individualized risk estimation, our framework provides clinicians with insight into both population-level risk patterns and patient-specific drivers of prediction. The accompanying web-based calculator further facilitates translation into clinical practice by enabling rapid, individualized ARDS risk assessment using routinely available variables, an approach increasingly advocated in precision ICU medicine.^13,20

Despite these strengths, certain weaknesses of our work should be noted. First, the retrospective design may introduce selection and diagnostic ascertainment bias. We excluded patients diagnosed with ARDS within the first 24 hours of ICU admission, which, while effectively preventing data leakage and ensuring the model functions as a prognostic warning system, limits its applicability to rapid-onset phenotypes. Second, our framework relies on static data from the initial 24-hour window and does not fully capture the temporal progression of critical illness. Incorporating longitudinal or time-series physiological data, such as trajectories of oxygenation or evolving organ failure scores, represents a promising future direction to further refine predictive accuracy and clinical relevance.³⁴ Regarding variable selection, although the inclusion of mechanical ventilation involves a degree of methodological circularity, sensitivity analysis confirmed that the underlying physiological derangements retain robust independent predictive power. Furthermore, the external validation cohort is characterized by a limited and unbalanced sample size, which results in wide confidence intervals and constrains the precision of calibration and subgroup assessments. Additionally, this study evaluates acute pancreatitis as a homogeneous entity and does not account for the divergent pathophysiological pathways of specific etiological subphenotypes.³⁵ Differences in ethnicity, disease etiology, and clinical practice patterns between regions may further influence model performance.¹⁰ Finally, a significant gap remains between risk prediction and actionable clinical decision-making. Future research should employ causal inference frameworks, such as target trial emulation, to determine whether specific interventions guided by these predictions can ultimately improve clinical outcomes.³⁶

Conclusion

Our research introduces a generalizable, interpretable, and high-performing RF model designed to predict ARDS in critically ill patients with AP. By capturing clinically meaningful non-linear physiological thresholds and translating predictions into a transparent, user-friendly interface, this model represents a substantive improvement over traditional risk assessment tools. Its application may facilitate earlier recognition of those with high risk and support specific treatment paths, eventually contributing to improved outcomes in this vulnerable population.

Supplemental material

Supplemental material - An interpretable machine learning model for predicting acute respiratory distress syndrome in critically ill patients with acute pancreatitis: A multicenter retrospective study

Supplemental material for An interpretable machine learning model for predicting acute respiratory distress syndrome in critically ill patients with acute pancreatitis: A multicenter retrospective study by Sheng Yana, Xia Ren, Chunyang Xu, Feng Zheng, Luojie Liu, Shun Wen, Xiaodan Xu and Yan Zhanga in Digital Health.

Footnotes

Acknowledgements

We would like to express our gratitude to the clinical staff at the Department of Critical Care Medicine and the Department of Emergency Medicine at Changshu Hospital Affiliated to Soochow University for their support in clinical data curation.

ORCID iDs

Xia Ren

Yan Zhang

Ethical considerations

The study was approved by the Institutional Review Boards of MIT, Beth Israel Deaconess Medical Center, and the Ethics Committee of Changshu Hospital affiliated with Soochow University (Approval No. L2024026). The requirement for informed consent was waived due to the retrospective and anonymous nature of the data. This research adhered to the Declaration of Helsinki.

Consent for publication

Not applicable. This study utilized anonymized data from the public MIMIC-IV database and retrospective, de-identified clinical data from our institution. The ethics committee waived the requirement for individual informed consent.

Author contributions

Sheng Yan: Conceptualization, Methodology, Software, Formal analysis, Visualization, Writing – original draft. Xia Ren: Methodology, Investigation, Data curation, Validation, Writing – original draft. Chunyang Xu: Data curation, Software, Validation. Feng Zheng: Investigation, Visualization, Writing – review & editing. Luojie Liu: Resources, Data curation, Investigation (External data collection). Shun Wen: Formal analysis, Validation, Writing – review & editing. Xiaodan Xu: Formal analysis, Validation, Writing – review & editing, Funding acquisition. Yan Zhang: Conceptualization, Resources, Writing – review & editing, Supervision, Project administration, Funding acquisition.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the Suzhou Special Project for Clinical Key Disease Diagnosis and Treatment Technologies (LCZX202334), Key Projects of the Changshu Science and Technology Development Program (CSWS202209), and the Special Research Fund Projects of the China International Medical Foundation (Z-2014-08-2309-1).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. The MIMIC-IV database is publicly accessible to credentialed users via PhysioNet ().

Data guarantor

Yan Zhang, as the corresponding author, accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish.

Supplemental material

Supplemental material for this article is available online.

References

Iannuzzi

King

Leong

, et al. Global Incidence of Acute Pancreatitis Is Increasing Over Time: A Systematic Review and Meta-Analysis. Gastroenterology 2022; 162: 122–134. https://doi.org/10.1053/j.gastro.2021.09.043

Beij

Verdonk

van Santvoort

, et al. Acute Pancreatitis: An Update of Evidence-Based Management and Recent Trends in Treatment Strategies. United European gastroenterology journal 2025; 13: 97–106. https://doi.org/10.1002/ueg2.12743

Xiao

Zhu

Chen

, et al. Exploring the therapeutic role of early heparin administration in ARDS management: a MIMIC-IV database analysis. J Intensive Care 2024; 12: 9. https://doi.org/10.1186/s40560-024-00723-5

Banks

Bollen

Dervenis

, et al. Classification of acute pancreatitis--2012: revision of the Atlanta classification and definitions by international consensus. Gut 2013; 62: 102–111. https://doi.org/10.1136/gutjnl-2012-302779

Zhang

Kuang

, et al. The risk factors for acute respiratory distress syndrome in patients with severe acute pancreatitis: A retrospective analysis. Medicine (Baltimore) 2021; 100: e23982. https://doi.org/10.1097/MD.0000000000023982

Ibadov

Arifjanov

Ibragimov

, et al. Acute respiratory distress-syndrome in the general complications of severe acute pancreatitis. Annals of hepato-biliary-pancreatic surgery 2019; 23: 359–364. https://doi.org/10.14701/ahbps.2019.23.4.359.. 2019/.

Zhang

Pang

. Early prediction of acute respiratory distress syndrome complicated by acute pancreatitis based on four machine learning models. Clinics (Sao Paulo) 2023; 78: 100215. https://doi.org/10.1016/j.clinsp.2023.100215

Lin

Han

, et al. A prediction model for acute respiratory distress syndrome among patients with severe acute pancreatitis: a retrospective analysis. Ther Adv Respir Dis 2022; 16: 17534666221122592. https://doi.org/10.1177/17534666221122592

Ding

Guo

Song

, et al. Nomogram for the Prediction of In-Hospital Incidence of Acute Respiratory Distress Syndrome in Patients with Acute Pancreatitis. The American journal of the medical sciences 2022; 363: 322–332. https://doi.org/10.1016/j.amjms.2021.08.009

10.

Zhang

Xiong

, et al. Development and external validation of models to predict acute respiratory distress syndrome related to severe acute pancreatitis. World J Gastroenterol 2022; 28: 2123–2136. https://doi.org/10.3748/wjg.v28.i19.2123

11.

Zheng

Jiang

, et al. A prediction model for predicting the risk of acute respiratory distress syndrome in sepsis patients: a retrospective cohort study. BMC Pulm Med 2023; 23: 78. https://doi.org/10.1186/s12890-023-02365-z

12.

Mao

Ling

Pan

, et al. Machine learning for the prediction of in-hospital mortality in patients with spontaneous intracerebral hemorrhage in intensive care unit. Scientific reports 2024; 14: 14195. https://doi.org/10.1038/s41598-024-65128-8

13.

Peng

Huang

Liu

, et al. Interpretable machine learning for 28-day all-cause in-hospital mortality prediction in critically ill patients with heart failure combined with hypertension: A retrospective cohort study based on medical information mart for intensive care database-IV and eICU databases. Frontiers in cardiovascular medicine 2022; 9: 994359. https://doi.org/10.3389/fcvm.2022.994359

14.

, et al. Machine learning predictors of risk of death within 7 days in patients with non-traumatic subarachnoid hemorrhage in the intensive care unit: A multicenter retrospective study. Heliyon 2024; 10: e23943. https://doi.org/10.1016/j.heliyon.2023.e23943

15.

Zou

Ren

Huang

, et al. The role of artificial neural networks in prediction of severe acute pancreatitis associated acute respiratory distress syndrome: A retrospective study. Medicine (Baltimore) 2023; 102: e34399. https://doi.org/10.1097/MD.0000000000034399

16.

Bai

Weng

, et al. Application of interpretable machine learning algorithms to predict distant metastasis in osteosarcoma. Cancer medicine 2023; 12: 5025–5034. https://doi.org/10.1002/cam4.5225

17.

Tong

, et al. Association between glucose-to-lymphocyte ratio and mortality in patients with heart failure from the MIMIC-IV database: a retrospective cohort study. Scientific reports 2025; 15: 21131. https://doi.org/10.1038/s41598-025-08349-9

18.

Pancreatic Surgery Group SSoCMA . Guidelines for diagnosis and treatment of acute pancreatitis in China (2021). Chin J Surg 2021; 20: 730–739. https://doi.org/10.3760/cma.j.cn115610-20210622-00297

19.

Ranieri

Rubenfeld

Thompson

, et al. Acute respiratory distress syndrome: the Berlin Definition. Jama 2012; 307: 2526–2533. https://doi.org/10.1001/jama.2012.5669

20.

Sha

Jiang

Bai

. An interpretable machine learning model for predicting sepsis-induced cardiomyopathy in ICU patients: development and validation using the MIMIC-IV database. Health information science and systems 2025; 13: 49. https://doi.org/10.1007/s13755-025-00367-1

21.

Wang

Song

, et al. Clinical Use of Nomogram Based on Machine Learning for Diagnosis Prediction of Acute Respiratory Distress Syndrome in Patients With Acute Pancreatitis. Mediators of inflammation 2025; 2025: 5610316. https://doi.org/10.1155/mi/5610316

22.

Rubulotta

Bahrami

Marshall

, et al. Machine Learning Tools for Acute Respiratory Distress Syndrome Detection and Prediction. Critical care medicine 2024; 52: 1768–1780. https://doi.org/10.1097/ccm.0000000000006390

23.

Lin

Yang

Liu

, et al. A pretrain-finetune approach for improving model generalizability in outcome prediction of acute respiratory distress syndrome patients. International journal of medical informatics 2024; 186: 105397. https://doi.org/10.1016/j.ijmedinf.2024.105397

24.

Goecks

Jalili

Heiser

, et al. How Machine Learning Will Transform Biomedicine. Cell 2020; 181: 92–101. https://doi.org/10.1016/j.cell.2020.03.022

25.

Chen

Lundberg

Lee

. Explaining a series of models by propagating Shapley values. Nature communications 2022; 13: 4512. https://doi.org/10.1038/s41467-022-31384-3

26.

Pinsky

Bedoya

Bihorac

, et al. Use of artificial intelligence in critical care: opportunities and obstacles. Critical care (London, England) 2024; 28: 113. https://doi.org/10.1186/s13054-024-04860-z

27.

Gajendran

Prakash

Perisetti

, et al. Predictors and outcomes of acute respiratory failure in hospitalised patients with acute pancreatitis. Frontline gastroenterology 2021; 12: 478–486. https://doi.org/10.1136/flgastro-2020-101496

28.

Karakike

Kyriazopoulou

Tsangaris

, et al. The early change of SOFA score as a prognostic marker of 28-day sepsis mortality: analysis through a derivation and a validation cohort. Critical care (London, England) 2019; 23: 387. https://doi.org/10.1186/s13054-019-2665-5

29.

Zhang

Chang

Ding

, et al. To Establish an Early Prediction Model for Acute Respiratory Distress Syndrome in Severe Acute Pancreatitis Using Machine Learning Algorithm. J Clin Med 2023; 12: 1718. https://doi.org/10.3390/jcm12051718

30.

Hansen

SEJ

Madsen

Varbo

, et al. Body Mass Index, Triglycerides, and Risk of Acute Pancreatitis: A Population-Based Study of 118 000 Individuals. The Journal of clinical endocrinology and metabolism 2020; 105: 163. https://doi.org/10.1210/clinem/dgz059

31.

Bos

LDJ

Ware

. Acute respiratory distress syndrome: causes, pathophysiology, and phenotypes. Lancet (London, England) 2022; 400: 1145–1156. https://doi.org/10.1016/s0140-6736(22)01485-4

32.

Huang

Yang

Zhao

, et al. Machine Learning-Based Prediction of Early Complications Following Surgery for Intestinal Obstruction: Multicenter Retrospective Study. Journal of medical Internet research 2025; 27: e68354. https://doi.org/10.2196/68354

33.

Lundberg

Lee

. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 2017.

34.

Duggal

Scheraga

Sacha

, et al. Forecasting disease trajectories in critical illness: comparison of probabilistic dynamic systems to static models to predict patient status in the intensive care unit. BMJ open 2024; 14: e079243. https://doi.org/10.1136/bmjopen-2023-079243

35.

Yang

Zhang

, et al. Identification of clinical subphenotypes of sepsis after laparoscopic surgery. Laparoscopic, Endoscopic and Robotic Surgery 2024; 7: 16–26. https://doi.org/10.1016/j.lers.2024.02.001

36.

Yang

Wang

Chen

, et al. A comprehensive step-by-step approach for the implementation of target trial emulation: Evaluating fluid resuscitation strategies in post-laparoscopic septic shock as an example. Laparoscopic, Endoscopic and Robotic Surgery 2025; 8: 28–44. https://doi.org/10.1016/j.lers.2025.01.001

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.45 MB