Abstract
Objectives
This study aimed to develop a multimodal predictive model that integrates clinical data, radiomics, and three-dimensional deep learning to forecast acute respiratory distress syndrome in patients with acute pancreatitis.
Methods
This retrospective study analyzed data from 759 patients with acute pancreatitis treated at three hospitals. Radiomics features were extracted from three-dimensional computed tomography images, and a three-dimensional deep learning model was developed using convolutional networks. These components were combined with clinical data using the XGBoost algorithm to construct a multimodal model. The performance of the model was compared with that of single-modal models and traditional scoring systems (Modified Computed Tomography Severity Index, Ranson score, and Bedside Index for Severity in Acute Pancreatitis), using area under the curve as the primary metric. Model interpretability was enhanced using variable importance analysis, SHapley Additive exPlanations, local interpretable model-agnostic explanations, calibration plots, and decision curve analysis.
Results
The multimodal model achieved area under the curve values of 0.872 (training set) and 0.876 (test set), outperforming traditional scores (Modified Computed Tomography Severity Index: 0.747 and 0.759; Ranson score: 0.575 and 0.568; and Bedside Index for Severity in Acute Pancreatitis: 0.748 and 0.757, respectively) and single-modal models (radiomics: 0.638 and 0.727 and deep learning: 0.756 and 0.727, respectively).
Conclusion
By integrating clinical tabular data, radiomics, and deep learning features, the multimodal model can predict the risk of acute respiratory distress syndrome in patients with acute pancreatitis at an early stage.
Keywords
Introduction
Acute pancreatitis (AP) is a common inflammatory disorder of the pancreas with a highly variable clinical course, ranging from mild, self-limiting disease to life-threatening severe forms.1,2 AP complicated by acute respiratory distress syndrome (ARDS) represents a particularly dangerous condition and is associated with significant morbidity and mortality. 3 Previous studies have highlighted that ARDS is the most common and severe pulmonary complication of AP, with reported mortality rates as high as 44.5% due to the rapid progression of multiorgan failure. The pathophysiological mechanisms underlying ARDS in AP involve systemic inflammatory response syndrome (SIRS), cytokine storm, and damage to the alveolar-capillary membrane, which collectively lead to refractory hypoxemia. 4 ARDS exacerbates pancreatic injury through bidirectional interactions, as pancreatic enzymes and inflammatory mediators circulate to the lungs, causing endothelial dysfunction, edema, and microthrombosis. 5 Without early prediction and aggressive ventilatory support (e.g. lung-protective strategies and extracorporeal membrane oxygenation (ECMO)), ARDS can contribute to prolonged intensive care unit (ICU) stays, secondary infections, and irreversible organ damage.6,7 Thus, ARDS associated with AP remains a critical determinant of poor prognosis and necessitates early risk stratification. 8
In clinical practice, patient management relies heavily on the integration of diverse types of information, including tabular laboratory data, imaging, and audio data. 9 Multimodal models that integrate data from multiple sources have demonstrated superior performance in medical tasks compared with single-modality models. These tasks include detection, classification, prediction, and prognosis.10–12 However, several challenges remain. For instance, different data modalities have distinct dimensional characteristics: clinical variables derived from electronic health records (EHRs) are typically one-dimensional, whereas imaging data are often two-dimensional or three-dimensional (3D). 13 In terms of feature extraction, high-level features are often manually identified or extracted using specialized software applications. 10
This study aimed to develop and validate predictive models for early and noninvasive prediction of ARDS by integrating data from multiple modalities across three medical centers. The data modalities included EHRs, radiomics, and deep learning features extracted from 3D computed tomography (CT) scans.
Methods
Study design
This was a retrospective multicenter study. Data extracted from EHRs and imaging systems were obtained anonymously from three hospitals: Jintan Hospital affiliated to Jiangsu University, Suzhou Kowloon Hospital, and the First Affiliated Hospital of Soochow University. Patients from Jintan Hospital and Suzhou Kowloon Hospital were assigned to the training set, while those from the First Affiliated Hospital of Soochow University were assigned to the independent testing set.
Adult patients aged over 18 years who were admitted with a diagnosis of AP between January 2017 and December 2023 were included. The exclusion criteria were as follows: (a) chronic liver or renal disease; (b) hematological disease; (c) chronic, recurrent, idiopathic, or traumatic pancreatitis; pancreatic tumors; or a history of pancreatic resection; (d) pregnancy; and (e) a history of chemotherapy or radiotherapy. The study flowchart is presented in Figure 1.

Flowchart of the study.
Ethical considerations
This retrospective cohort study was approved by the Ethics Committee of the First Affiliated Hospital of Soochow University (approval number: 2022098). The need for informed consent was waived for this retrospective study, and all participants were managed in accordance with established guidelines.6,14 ARDS was diagnosed according to the Berlin definition. 15
Data collection
Data preparation
This study employed a retrospective approach to systematically collect clinical data, including demographic information (age and sex), clinical characteristics (e.g. etiological classification, including biliary, hyperlipidemia, alcoholic, and other causes), and comorbidities (e.g. hypertension and diabetes). Blood samples were collected within 24 h of admission for all patients to perform complete blood cell analysis, liver and kidney function tests, electrolyte measurements, lipid profiling, C-reactive protein assessment, and other laboratory evaluations. Abdominal noncontrast CT scans in DICOM format were obtained within 72 h of admission and scored using the Modified CT Severity Index (MCTSI). 16 To comprehensively evaluate patient condition, the SIRS score, Bedside Index for Severity in Acute Pancreatitis (BISAP), and Ranson score were calculated within 24 h of AP onset to further quantify disease severity. 17
Data cleaning
To ensure data quality and prevent potential data leakage, rigorous data cleaning procedures were conducted separately for each dataset. Variables with missing rates exceeding 20% were excluded to minimize the impact of incomplete data on analytical results. As shown in Figure 1, 79 patients were excluded from the training set because of missing data, while 51 patients were excluded in the test set. Table S1 lists the exact missing rates (%) for each clinical feature included in the multimodal model.
For the remaining variables with missing values, imputation was performed using the classification and regression tree (CART) algorithm. Classification trees were used for categorical variables, while regression trees were applied to continuous variables.
Development of models
With the occurrence of ARDS within 1 week of admission defined as the outcome variable, multiple predictive models were developed using different modalities: (a) radiomics; (b) 3D deep learning; and (c) a multimodal approach integrating clinical tabular data, radiomics, and deep learning.
Radiomics model
Feature extraction. The collected CT images were processed using 3D Slicer software (version 5.0.3) to manually delineate the pancreatic lesion regions and generate a 3D pancreatic model. Subsequently, 107 radiomics features were extracted using the built-in “Radiomics” module, which included seven categories: first-order statistics, gray-level co-occurrence matrix, gray-level dependence matrix, gray-level run-length matrix, gray-level size-zone matrix, neighboring gray-tone difference matrix, and 3D-based shape features. Principal component (PC) analysis was applied to the 107 radiomics features for dimensionality reduction through linear transformation.12,18,19 PCs accounting for over 80% of the total explained variance were selected for subsequent modeling.
Consistency analysis. To ensure the robustness of the radiomics model, the intraclass correlation coefficient (ICC) was used to evaluate the consistency of the regions of interest (ROIs) during delineation and feature extraction. Two experienced radiologists independently delineated the ROIs. Both inter- and intraclass consistency tests were conducted to validate the reliability of the model. According to the ICC scoring criteria, consistency was categorized into three levels: low consistency (ICC < 0.4), moderate consistency (ICC between 0.4 and 0.75), and high consistency (ICC > 0.75), with ICC values ranging from 0 to 1.
Model development. After standardization, feature selection was performed using differential testing, Pearson correlation analysis, and Least Absolute Shrinkage and Selection Operator (LASSO) regression. This pipeline reduced the 12 PCs (retaining >80% variance from the original 107 radiomics features) to 5 final features, which were used to train the XGBoost model (R CRAN version: 1.7.8.1). The final model output served as a binary classifier to predict the occurrence of ARDS.
Deep learning model
To construct a binary deep learning predictive model for AP complicated by ARDS, two key steps were undertaken: (a) development of a pancreatic 3D segmentation model for noncontrast CT and (b) construction of a classification model based on semantically segmented 3D CT images.
Pancreatic 3D segmentation model. This study used the U-Net architecture as the baseline framework. The input noncontrast CT images were preprocessed by cropping to a size of 228 × 228 pixels, applying histogram equalization to enhance image contrast and maximize information entropy, and normalizing pixel values to eliminate scale-related effects. During model training, the Adam optimizer was used with a batch size of 20 and 200 epochs to ensure adequate data features and model coverage. Additionally, real-time data augmentation was employed to dynamically transform images during training, effectively enhancing the model’s generalization ability.
Deep learning classification model. Based on the construction of a pancreatic 3D segmentation model, a deep learning classification model was further developed. This model takes 3D CT images of the pancreas as input and outputs the probability of occurrence of the primary outcome. Specifically, the 3D images are fed into a 3D ResNet50 encoder, where high-level features were extracted through a deep convolutional network and subsequently passed to a fully connected layer to further process and generate probability predictions for AP complicated with ARDS.
Multimodal fusion model
To fully leverage the complementary strengths of different data modalities, we constructed a multimodal fusion model using the XGBoost algorithm. Specifically, the predicted probabilities (i.e. continuous risk scores) generated by the radiomics model and the 3D deep learning model were used as two input features. These probabilities were combined with all available clinical tabular variables, following data cleaning and imputation, as candidate predictors. During training of the XGBoost fusion model, feature importance was automatically computed, and the top eight clinical features—ranked according to their contribution to prediction—were identified post hoc from the final model (see section “Variable importance of the multimodal model” and Figure 2(a)). Thus, the final multimodal model inputs consisted of the following: (a) the radiomics prediction score, (b) the deep learning prediction score, and (c) the top eight clinical variables selected based on XGBoost-derived feature importance within the fusion framework.

Global explainability of the multimodal model. (a) Variable importance of the multimodal model and (b) SHAP plots: Each point represents one case. The closer the variable value is to 1.0 at the x-axis, the higher the possibility of ARDS. ARDS: acute respiratory distress syndrome; SHAP: SHapley Additive exPlanations.
Statistical analysis
Receiver operating characteristic (ROC) curves were plotted, and the area under the curve (AUC) was calculated to evaluate the diagnostic performance of different models and existing scoring systems. The DeLong test was used to compare AUCs among the models and scoring systems.
Explainability and visualization
Variable importance was visualized using bar charts to illustrate the contribution of individual features to the prediction task. SHapley Additive exPlanations (SHAP) is a method used to explain predictions generated by machine learning models. This method is based on the concept of Shapley values from game theory, which provides an approach for fairly distributing the global contribution of each feature to the model’s prediction. Meanwhile, local interpretable model-agnostic explanations (LIME) provide local explanations that elucidate the relationship between relevant features and the individual predictions by examining specific examples.20,21
Calibration plots (also known as calibration curves) assess how well a model’s predicted probabilities align with the actual outcomes, reflecting the accuracy of the predicted probabilities. If the calibration curve closely follows the 45-degree reference line, it indicates that the model’s predicted probabilities are highly accurate. 22
Decision curve analysis (DCA) plots assess the clinical utility of a predictive model by comparing the net benefit across different decision thresholds. They reflect how effectively a model’s predictions translate into actionable decisions that benefit patients or users. DCA plots illustrate the net benefit of using a model across a range of threshold probabilities. The net benefit is calculated by considering true positives, false positives, and the relative harm associated with false positives. DCA plots help determine the range of threshold probabilities over which a model provides a positive net benefit, indicating its potential usefulness in clinical practice. Thus, DCA plots can be used to compare multiple models by identifying which model provides the highest net benefit across varying threshold probabilities.
Software versions and platforms
Statistical analyses were conducted using R (version 4.1.0) and Python (version 3.9). For the development of machine learning algorithms, the tidymodels platform (version 0.2.0) was utilized. Deep learning models were constructed using the Keras Python platform with TensorFlow version 2.8.0 as the backend. The hardware used for this study was a Mac mini equipped with an 8-core Apple M1 processor (with an integrated graphic processing unit (GPU)) and 16 GB of memory to ensure efficient execution of the experiments.
Results
Characteristics of patients in the study
As shown in Table 1, a total of 515 patients with AP were included in the training set, of whom 183 developed ARDS. In the test set, 244 patients with AP were included, of whom 101 developed ARDS. In the training set, biliary disease was the most common etiological cause regardless of ARDS status; it was observed in 38.9% of patients without ARDS and 43.7% of patients with ARDS. Hyperlipidemia was the second most common cause (21.1% and 27.9%), followed by alcohol-related etiologies (8.1% and 4.4%, respectively). Similarly, in the test set, biliary factors remained the most common cause (51.7% and 48.5%, respectively).
Characteristics of patients at admission.
Continuous variables are presented as mean (standard deviation) or median (interquartile range); categorical variables are presented as count (percentage). p-values indicate comparisons between the two groups.
ALP: alkaline phosphatase; ALT: glutamic-pyruvic transaminase; AST: glutamic-oxaloacetic transaminase; CRP: C-reactive protein; GGT: glutamyl transpeptidase; L: lymphocyte; N: neutrophil; PLT: platelet; SIRS: systemic inflammatory response syndrome; TG: total triglycerides; WBC: white blood cell; Ca2+: calcium ion.
Variable importance of the multimodal model
This study developed a multimodal prediction model for ARDS that integrated clinical tabular variables with prediction scores derived from deep learning and radiomics models. During model development, the XGBoost algorithm was used to rank and incorporate variables according to their relative importance (Figure 2(a)). Specifically, the prediction scores derived from the deep learning and radiomics models were ranked as the most important features, followed by the presence of SIRS, calcium ion levels, neutrophil count, creatinine, C-reactive protein, procalcitonin, blood glucose, and albumin.
General explainability of the multimodal model: SHAP plots
By aggregating SHAP values across all instances, we obtained global insights into feature importance and interactions, enabling the identification of the most influential features in the model. As shown in Figure 2(b), features with values closer to 1.0 were associated with a higher likelihood of developing ARDS.
Local explainability of the multimodal model: LIME plots
Figure 3 illustrates the prediction process of the multimodal model for two randomly selected cases. For an AP patient without ARDS (Figure 3(a)), the model generated a predicted value of 0.026, consistent with the patient’s observed outcome. The most influential contributors to this prediction included the deep learning model score (−1.272), radiomics model score (−1.035), neutrophil count (11.7 × 109/L), procalcitonin (0.19 ng/mL), and blood glucose (7.48 mmol/L), with corresponding contributions of −0.239, −0.075, −0.012, −0.012, and −0.008.

Local explainability of the multimodal model: LIME. (a) A randomly chosen patient with AP without ARDS and (b) a randomly chosen patient with AP and ARDS. alb: albumin; AP: acute pancreatitis; ARDS: acute respiratory distress syndrome; ca: Ca2+ (calcium ion); cr: creatinine; crp: C-reactive protein; glu: glucose; LIME: local interpretable model-agnostic explanations; n: neutrophil; pred_dl: deep learning model; pred_rad: radiomics model; pct: procalcitonin; SIRS: systemic inflammatory response syndrome.
Conversely, Figure 3(b) presents the prediction for a patient with AP who developed ARDS. The model generated a predicted probability of 0.869, which was concordant with the patient’s true outcome. The five most significant contributors to this prediction were the deep learning model score (1.753), radiomics model score (0.442), calcium ion concentration (1.952 mmol/L), neutrophil count (0.19 ng/mL), and CRP (8.302 mg/L), with corresponding contributions of +0.206, +0.197, +0.055, +0.053, and −0.047.
Performance of models and scoring systems
As shown in Figure 4, the multimodal model demonstrated superior performance in the training set (AUC = 0.872) compared with other models and established scoring systems (all p-values < 0.001). Similarly, in the test set, the multimodal model achieved the highest performance (AUC = 0.876).

ROC analyses of models and scoring systems. (a) Training data and (b) test data.
Calibration plots
Figure S1 presents the calibration plots for the multimodal model. In the training set (Figure S1(A)), the multimodal model demonstrated good calibration, with minimal deviation between the observed and predicted probabilities (mean absolute error = 0.072). In the test set (Figure S1(B)), the mean absolute error was 0.032, further indicating high accuracy of the predicted probabilities.
DCA plots
As shown in the DCA plot in Figure S2, the y-axis represents the net benefit. The blue line represents the multimodal model, the red line represents the radiomics model, and the green line represents the deep learning model. The “All” line represents the assumption that all patients with AP developed ARDS, while the “None” line represents the assumption that no patients with AP developed ARDS.
In the training set (Figure S2(A)), the multimodal model provided a positive net benefit across all patients, except at threshold probabilities of 72%–78%. The model also outperformed both the deep learning and radiomics models in terms of net benefit. In the test set (Figure S2(B)), the multimodal model provided a net benefit across all patients, except at threshold probabilities between 85% and 90%, again demonstrating superior performance compared with the deep learning and radiomics models.
Discussion
This study integrated clinical tabular data, radiomics features, and 3D deep learning features obtained at admission to develop a multimodal model for predicting ARDS in patients with AP. The model demonstrated robust performance in the training set, confirming its superiority over single-modality models and highlighting its favorable calibration and clinical utility. Moreover, the robustness and generalizability of the model were further validated using an independent test set. Additionally, the incorporation of explainable approaches, including global variable importance analysis and local prediction interpretation, enhanced the transparency and interpretability of the model.
In recent years, deep learning has emerged as a benchmark approach in machine learning,19,23 particularly following the advent of convolutional neural networks. 24 Deep learning has achieved significant progress in text and image processing, enabling simultaneous learning from diverse data types. The integration of large volumes of feature-rich data, which are increasingly available in modern medical settings, is expected to enhance model performance. 10 Artificial intelligence (AI) algorithms are particularly well suited to handle the nonlinear, high-dimensional, and complex characteristics of medical data. 25 Thus, integrating multiple data modalities can improve clinical acceptance and diagnostic accuracy, while AI-based tools can help reduce data redundancy and further enhance model performance. 21
ARDS is a life-threatening form of respiratory failure characterized by acute bilateral pulmonary edema and refractory hypoxemia. Early identification of patients at high risk of ARDS, along with accurate prediction of disease severity, is crucial. 26 Previous studies have proposed several scoring systems to predict the severity of AP, including MCTSI, Ranson, and BISAP; however, their feasibility for predicting ARDS remains unclear. Li et al. 27 conducted a multicenter retrospective study involving 597 patients diagnosed with AP across 4 hospitals. Using multivariate logistic regression analysis, they identified four independent risk factors for both severe AP (SAP) and ARDS: heart rate, respiratory rate, serum calcium, and blood urea nitrogen. For ARDS prediction, the model achieved AUC values of 0.892 and 0.833 in the training and test sets, respectively, outperforming SIRS and the quick Sepsis-Related Organ Failure Assessment. Zou et al. 28 developed predictive models for ARDS following SAP using artificial neural networks and logistic regression approaches. They collected clinical tabular data from 214 patients with SAP, randomly divided into a training set (n = 149) and a test set (n = 65). The artificial neural networks model demonstrated an accuracy of 80.0% and an AUC of 0.853. The most critical predictive variables in the artificial neural network model included the BISAP score, procalcitonin, prothrombin time, and serum calcium. Zhou et al. 29 developed several machine learning models to predict acute respiratory failure in patients with AP using data from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. The evaluated algorithms included logistic regression, decision tree, k-nearest neighbors, naive Bayes, and XGBoost. Among these models, the XGBoost model achieved the best predictive performance, with AUCs of 0.86 and 0.87 in the training and validation cohorts, respectively. Radiomics has also been explored for predicting ARDS in AP, 30 with studies suggesting that semiquantitative metrics derived from chest CT images, such as pleural effusion volume and the number of affected lung lobes with consolidation, may aid in the early identification of ARDS. However, this anthropometric-based approach is subject to subjective bias, and not all patients with AP undergo routine chest CT at admission. This limitation not only leads to unnecessary use of medical resources but also increases the economic burden on patients.
In this study, we developed a multimodal model that integrates clinical tabular data, radiomics features, and deep learning features derived from abdominal CT scans obtained at admission to predict ARDS in patients with AP. The model achieved AUC values exceeding 0.85 in both the training and test sets, significantly outperforming single-modality models (radiomics and deep learning) and established clinical scoring systems, including MCTSI, Ranson, and BISAP. The model demonstrated greater efficiency in handling high-dimensional and multimodal data, thereby improving predictive accuracy and robustness. Moreover, we collected routine clinical test data at admission to extract all feature variables and subsequently optimized variable selection using the XGBoost algorithm, which identified SIRS, calcium ions, neutrophils, creatinine, C-reactive protein, procalcitonin, blood glucose, and albumin as the top eight tabular variables in the model’s importance ranking, consistent with previous studies. Furthermore, the interpretability of the model was enhanced by visualizing the impact of key variables on prediction outcomes through global and local interpretability graphs, providing intuitive references for clinical decision-making. Finally, calibration and DCA plots further confirmed the model’s predictive accuracy and clinical benefit.
However, this study has several limitations. First, the data samples were primarily derived from the Chinese population, which may restrict the applicability of the model to other racial and regional groups. Notably, the etiology of AP varies significantly across regions—biliary causes predominate in East Asia, whereas alcoholic or hypertriglyceridemic etiologies are more prevalent in Western populations. These etiological differences may influence systemic inflammatory profiles, imaging characteristics, and, consequently, the relevance of identified biomarkers (e.g. calcium and CRP) or radiomic patterns. Additionally, genetic and environmental factors may further influence disease progression and susceptibility to ARDS. Therefore, external validation in geographically and ethnically diverse cohorts is essential before clinical implementation. Second, the retrospective collection of data from multiple centers resulted in the presence of missing data. Although strict inclusion and exclusion criteria, along with a relatively large sample size can, may mitigate these impacts, prospective studies are still required to further validate the performance of the multimodal model. Finally, the current model does not incorporate a real-time mechanism to assess the completeness and quality of input data and therefore cannot automatically identify or flag cases in which insufficient data may result in unreliable predictions. Future studies could incorporate uncertainty quantification methods to further enhance the clinical safety of the model.
Conclusion
This study developed and validated predictive models for the early and noninvasive prediction of ARDS in patients with AP by integrating data from multiple modalities across three medical centers. The data modalities included EHRs, radiomics features, and deep learning features extracted from 3D CT scans. This approach may assist medical practitioners in the early identification of patients at high risk of ARDS and potentially improve management strategies for individuals with AP.
Supplemental Material
sj-xlsx-1-imr-10.1177_03000605251410432 - Supplemental material for Multimodal prediction models integrating radiomics and three-dimensional deep learning for acute respiratory distress syndrome in acute pancreatitis patients
Supplemental material, sj-xlsx-1-imr-10.1177_03000605251410432 for Multimodal prediction models integrating radiomics and three-dimensional deep learning for acute respiratory distress syndrome in acute pancreatitis patients by Jielu Zhou, Yuying Wu, Wen Liang, Lin Liu, Chenyang Zhang, Yiping Shen, Meiyu Chen, Yu Wang, Chen Chao, Minyue Yin, Jinzhou Zhu and Hailong Ge in Journal of International Medical Research
Footnotes
Acknowledgments
Not applicable.
Author contributions
J. Zhou, Y. Wu, and L. Liu wrote the manuscript; W. Liang, C. Zhang, Y. Wang, and M. Yin collected the clinical data; Y. Shen, C. Chen, and M. Chen analyzed the clinical data; and H. Ge and J. Zhu contributed to the study design.
Data availability statement
The code used in this study is available upon reasonable request from the corresponding author.
Declaration of conflicting interests
The authors declare no competing interests and financial disclosure.
Funding
This study was supported by the National Natural Science Foundation of China (82000540), Youth Program of Suzhou Health Committee (KJXW2019001), the Open Fund of Key Laboratory of Hepatosplenic Surgery, Ministry of Education (GPKF202304), Science and Technology Projects of Jintan Municipal Health Commission (JTYXH-2025-1-02), Medical Education Collaborative Innovation Fund of Jiangsu University (No. JDYY2023042), and Suzhou key specialty (2241.07.01-0125).
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
