Sage Journals: Discover world-class research

Abstract

Aims

New-onset atrial fibrillation (NOAF) occurs in approximately 23% of patients with sepsis and is independently associated with increased mortality. Therefore, early prediction of NOAF has significant clinical value. However, current artificial intelligence (AI) models predominantly rely on tabular data. These unimodal AI models face limitations in predicting NOAF as they fail to fully utilize the predictive potential arising from the interplay of multimodal data.

Methods

We reviewed current Machine Learning (ML) and Deep Learning (DL) approaches for atrial fibrillation (AF) prediction. It summarizes the selected features in ML models for predicting AF in ICU patients, and the advantages of time-window selection in DL models using electrocardiogram (ECG) signals. Notably, we compared these models in terms of feature selection, prediction horizons, and performance when applied to tabular data and ECG signal features. To enhance the predictive capability of ML for NOAF in patients with sepsis, we drew inspiration from multimodal models developed for other diseases, such as Alzheimer's disease, and proposed integrating tabular data and ECG signal data within a multimodal framework.

Results

This study systematically analyzed the application of ML and DL in AF prediction. After screening, 12 studies (6 ML, 6 DL) were included. ML models, based on electronic medical records (EMR) or ECG features, achieved prediction windows ranging from minutes to hours with AUCs of 0.74–0.90. DL models processing raw ECG signals extended prediction windows to days, achieving AUCs of 0.74–0.96, with performance improving with larger datasets. A Transformer-based multimodal model (integrating clinical data and ECG) was proposed to enhance AF prediction in sepsis patients, though further validation is needed for cross-modal data fusion feasibility.

Conclusions

Transitioning from unimodal predictive models to multimodal frameworks that combine tabular clinical data and raw ECG signals is feasible within the current deep-learning framework. This approach has the potential to significantly improve the early prediction capabilities of NOAF in sepsis patients.

Keywords

Sepsis atrial fibrillation multimodal prediction deep learning machine learning

Introduction

Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia with a lifetime risk ranging from 22% to 26%, and its prevalence increases with age.¹ It is a major risk factor for heart failure, ischemic stroke, and increased mortality, imposing a heavy burden on the healthcare system.² For intensive care units (ICU) patients, new-onset atrial fibrillation (NOAF) serves as both an indicator of disease severity and a potential contributor to adverse prognosis, occurring in approximately one in every six patients.

Sepsis is a severe and potentially fatal systemic response to infection that threatens millions of lives worldwide each year. As a common complication of critical illness, AF is the most predominant arrhythmia observed in patients with sepsis. Studies have indicated that the incidence of NOAF in patients ranges from 20% to 30%%.^2,3 If not promptly detected and treated, it can lead to severe health consequences including progressive organ dysfunction and even death. Moreover, its occurrence in septic patients is not merely a marker of severity but is independently associated with a significantly increased risk of in-hospital stroke and mortality.⁴ Consequently, the prediction and early detection of NOAF are imperative in patients with sepsis. Several models have been established to predict and detect NOAF in ICU patients, and the AF score is currently recognized as the gold standard. However, its clinical application is limited owing to its low accuracy.

ML involves algorithms that utilize statistical and optimization techniques to learn from historical data. The primary objective of ML is to identify or predict significant outcomes from large-scale datasets. When applied to medical datasets, ML often performs better than the traditional clinical scoring systems.^5,6 In recent years, ML models have been increasingly employed to predict NOAF, emphasizing the crucial necessity of effective feature engineering and model validation to improve the predictive accuracy and clinical applicability.^7,8 These models can be divided into two categories: the early warning model to predict the occurrence of AF during sepsis and the real-time detection model. Researchers in the former often rely on tabular data, such as demographics, physical signs, and test results, to train models for predicting the risk of AF during sepsis. Raw ECG signals that provide critical information are often overlooked. In contrast, the latter studies are mostly based on bedside ECG data, which mainly focus on the automatic diagnosis of AF when it occurs.^9,10 Despite their high accuracy, these prediction models may not provide clinicians with sufficient information to prevent the onset of AF, due to their short prediction horizons—often limited to only a few minutes before onset.^11,12 Additionally, the integration of ECG data has been demonstrated to significantly enhance predictive capabilities, allowing for a more comprehensive assessment of patients and improved early intervention strategies.

This article presents a systematic review (SR) by searching the PubMed database to identify studies focused on ML models for predicting NOAF, with a particular emphasis on multimodal models. We compared these models in terms of feature selection using tabular data and ECG signals, clinical applications, and predictive performance. Additionally, we referenced the application of multimodal models to other diseases (such as Alzheimer's disease) to evaluate their potential value for NOAF prediction.

Materials and methods

Protocol and registration

This SR was conducted based on the recommendations of the Preferred Reporting and Items for Systematic Reviews and Meta-Analyses (PRISMA) and was recorded in the PROSPERO registry for SR ((ID number: CRD420250654679). The full protocol is available at: https://www.crd.york.ac.uk/prospero/.

Search strategy

We conducted a systematic search of PubMed, Embase, Medline and Scopus databases covering publications available until Feb 16, 2025. The search strategy employed the following terms: (atrial fibrillation OR new-onset atrial fibrillation) AND (artificial intelligence OR machine learning OR Deep Learning) AND (intensive care unit OR ICU OR Sepsis).

The identification and removal of duplicate records were conducted using Zetero prior to the screening process. Data extraction was subsequently performed using Excel software. Two independent reviewers (W.X. and B.J.) carried out the initial title and abstract screening. The full-text review and subsequent screening were conducted by B.J. and Y.F.L., with all exclusion reasons being systematically documented during the screening process. During title/abstract screening and full-text review, any disagreements between the three independent reviewers were resolved through consensus discussions.

Inclusion and exclusion criteria

Studies were eligible for inclusion if they met the following criteria:

Study objective: Focused on predicting clinical outcomes of AF, including both pre-existing and new-onset cases. Study Population: Included adult patients (age > 18 years) admitted to ICU or diagnosed with sepsis. Methodology: Utilized ML or DL techniques for predictive modeling. Model Description: Provided detailed descriptions of the ML/DL models and predictor variables used for risk prediction. Study Design: Included randomized controlled trials (RCTs), observational studies, cohort studies, case-control studies, and review studies. Language: Published as full-text articles in English.

Exclusion criteria encompassed case reports, comments, letters to the editor, editorials, study protocols, and replies. Studies involving pediatric patients and those not published in English were also excluded. The studies were required to provide a description of the ML or DL models and predictor variables used in prediction and/or detection of NOAF in sepsis patients. Data collected from the studies included author information, year of publication, variables used in the model, population studied, prediction model type, and model performance metrics including the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity positive predictive values, negative predictive values, area under the precision–recall curve, precision and recall, F1 score of the ML or DL algorithms in predicting and detecting the occurrence of NOAF.

Risk of bias assessment in studies

The risk of bias assessment was systematically conducted by evaluating five critical parameters: an sufficiently sized cohort, appropriate cross-validation methods, an external validation dataset, blinding of participants and personnel, and the handling of incomplete outcome data. For data extraction, a predefined form was employed to capture essential information including study design, sample size, patient demographics, intervention or exposure specifics, feature selection, outcome measures, model performance, comparator details, follow-up duration, funding sources, and potential conflicts of interest.

Results

Study selection and description

A comprehensive literature search was conducted in November 2024, yielding 2497 records related to various aspects of AF, including risk factors, prediction, treatment, prophylaxis, monitoring, detection, and outcomes. Among these, 2302 articles were identified as potentially relevant for examining the application of ML or DL in AF prediction and detection. Two independent expert reviewers performed the screening process, ultimately selecting 66 articles from the initial pool. A secondary screening was then conducted to exclude studies that lacked detailed descriptions of their ML/DL models. Any disparities between the reviewers were resolved through discussion and consensus. Ultimately 12 studies, each aligning with at least one of the designated outcomes, were determined to be eligible for inclusion in this SR. These studies were subsequently categorized into ML-based AF prediction models (n = 6) and DL-based AF prediction models (n = 6). The study selection process is detailed in the PRISMA flowchart presented in Figure 1.

Figure 1.

PRISMA diagram of the literature search and exclusion process flowchart. AI: artificial intelligence; ML: machine learning; NOAF: new-onset atrial fibrillation; SR: systematic review; ICU: intensive care unit; DL: deep learning.

The ML models reviewed in this study were trained using datasets derived from several large-scale, publicly or institutionally available sources. These included the Intensive Care (MIMIC-II,¹³ III,¹⁴ IV¹⁵) databases, the Kensington General Hospital (KGH) database,¹⁶ AmsterdamUMCdb,¹¹ the AF Prediction Challenge database,¹⁷ the Ghent University Hospital database,¹¹ and the Belgian hospital ICU database.¹¹ The diversity of these databases guarantees a broad range of patient characteristics and clinical scenarios, thus facilitating a robust model evaluation. A brief description of databases used in this study presented in Supplementary material.

The current ML-based AF prediction models

The types of features utilized in different supervised ML models and their performances for AF prediction are summarized in Table 1.

Table 1.

Comparison of current ML-based AF prediction models using tabular data.

Optimal model in literature	Types of features used in the model	Number of participants in the cohort	Prediction point (from the non-AF to AF)	Performance of ML algorithms
CatBoost classification models¹¹	Demographic data， laboratory data, vital signs, medication data, ICU scoring, no ECG data	101,114	1.5–13.5 hours	AUC = 0.81
Gradient boosted machine¹²	Same as above	6349	N/A	AUC = 0.74
Random forest¹⁸	Only ECG feature data (RR interval-based features: HRV metrics, Poincaré plot descriptors, etc.)	25	10 minutes	AUC = 0.80
Logistic regression¹⁹	Ventilation status (e.g., FiO2), vital signs， demographic data， laboratory data, no ECG data	18,518	1 hours	AUC = 0.82
XGBoost²⁰	Same as above	16,528	N/A	AUC = 0.89
K-NN²¹	Only ECG feature data (RR interval-based features: HRV metrics, Poincaré plot descriptors, etc.)	100	2.5–7.5 minutes	Acc = 0.90

AUC: area under the curve; K-NN: k-nearest neighbors; XGBoost: eXtreme Gradient Boosting; Acc: accuracy; HRV: heart rate variability.

Among the models, CatBoost¹¹ utilized the most comprehensive feature set, excluding ECG data. Tested on a cohort of 101,114 participants, it achieved an AUC of 0.81 with a prediction window of 1.5–13.5 hours. Similarly, the Gradient Boosted Machine¹² used the same feature set, but was tested on a smaller cohort of 6349 participants, resulting in a lower AUC of 0.74. In contrast, Random Forest¹⁸ focused exclusively on ECG features, tested on only 25 participants, and achieved an AUC of 0.80 with a short prediction window of 10 minutes. Logistic Regression¹⁹ employed a diverse feature set, including ventilation parameters (e.g., FiO2), vital signs, demographic data, and laboratory results, while excluding ECG data. It achieved an AUC of 0.82 in a cohort of 18,518 participants with a 1-hour prediction window. XGBoost²⁰ used a similar feature set, achieving the highest AUC of 0.89 across 16,528 participants, demonstrating its ability to handle complex, multivariate data. Finally, K-Nearest Neighbors (K-NN)²¹ relied solely on ECG features and achieved an accuracy of 0.90 in a cohort of 100 participants with a prediction window of 2.5–7.5 minutes. However, its reliance on accuracy as a metric may not adequately reflect the model performance in imbalanced datasets.

In this study, tabular data are defined as a structured representation of features, typically organized in a table format, where rows correspond to individual samples (participants) and columns represent feature variables. These features are classified as either numerical (e.g., laboratory results and vital signs) or categorical (e.g., demographics and medication usage). This format is versatile, allowing for the integration of various data types, such as demographic information, laboratory results, vital signs, medication records, and ICU scoring systems (e.g., APACHE II and SOFA). However, although tabular data provide a standardized approach for feature utilization, they often lack the temporal resolution needed to fully capture sequential patterns, such as the dynamic variations observed in ECG signals.

Table 1 indicates that the six existing NOAF prediction models can be broadly classified into two categories based on the type of feature data utilized. The first category includes demographic information, laboratory results, vital signs, medication data, ventilation status, and ICU scoring,^11,12,19,20 which are typically derived from electronic medical record (EMR) systems and encompass comprehensive patient health information. Moreover, tabular data can be conveniently collected and organized from EMR, facilitating effective data analysis and model development. Other researchers’ AF testing models showed that the AUC of AF detection was 0.79 during a 6-month follow-up. This performance is comparable to that of several recent non-AI-based clinical AF risk scores, which exhibit an AUC range from 0.71 to 0.79.^22–24 In terms of time prediction windows, these models generally predict the timing of NOAF occurrence at an hourly resolution. Thus, AI-EHR-based predictive tools may eventually play a role in clinical practice for predicting the occurrence of NOAF.²⁵

In comparison, the second category focuses exclusively on ECG data, encompassing various HRV-related features, as shown in Figure 2. These features include the standard deviation of the heart rate data (SDNN), the total number of consecutive heart rate differences exceeding 50 ms (NN50), ratio of NN50 to the total number of RR intervals (pNN50), skewness and kurtosis of the heart rate data, root mean square of successive differences (RMSSD), Poincaré plot features, sample entropy (SE), multiscale entropy (ME), approximate entropy (AE), very-low-frequency power (VLF), low-frequency power (LF), high-frequency power (HF), and their respective power spectral densities (PSDs). These features are extracted using methods such as the tunable Q-factor wavelet transform (TQWT), variable frequency complex demodulation (VFCDM), and time-domain and nonlinear feature extraction techniques. Although ECG data contain features directly related to NOAF and may implicitly include signal patterns that precede the onset of AF, the second type of model converts the original ECG signal into tabular data through feature extraction during the training process, which commonly referred to as shallow models “shallow models.” ML-processed ECG features are insufficient to capture subtle patterns preceding AF onset and fail to incorporate the molecular mechanisms leading to AF. Consequently, their prediction horizon remains limited to a minute-level timeframe.

Figure 2.

Feature extraction of HRV signals using TQWT, VFCDM, and time domain and nonlinear methods. SE: sample entropy; AE: approximate entropy; ME: multiscale entropy; LF: low-frequency power; HF: high-frequency power; PSDs: power spectral densities; TQWT: the tunable Q-factor wavelet transforms; VFCDM: variable frequency complex demodulation; SDNN: the standard deviation of the heart rate data; NN50: the total number of consecutive heart rate differences exceeding 50 ms; pNN50:ratio of NN50 to the total number of RR intervals; RMSSD: skewness and kurtosis of the heart rate data, root mean square of successive differences.

Indeed, the arrangement and dispersion of myocardial fibers play a crucial role in shaping the heart's anisotropic conductive properties, which in turn affect the occurrence and maintenance of AF. By employing DL algorithms to directly extract features from 12-lead (or multi-lead) ECG signals, these features may contain signal patterns related to parameters such as QRS duration, QT interval, and Tpeak-Tend interval, rather than directly calculating these parameters. This method of data analysis capitalizes on the advantage of DL in automatically performing feature engineering.

In summary, both types of models have limited prediction resolution. Therefore, we introduce a DL model in the next section that can capture the original ECG signal features from several days prior as input, thus extending the time window from an hourly resolution to a weekly scale.

The current DL-based AF prediction models

In recent years, DL has achieved remarkable progress in the medical field, with its applications in disease diagnosis, treatment prediction, and personalized medicine expanding steadily.²⁶ In the area of medical imaging analysis, breakthroughs in DL algorithms, particularly convolutional neural networks (CNNs), have enabled models to achieve performance in disease detection (e.g., tumor segmentation and lung nodule detection) that rivals or even surpasses that of human experts.^27,28 In the field of natural language processing (NLP), DL-based models are widely used to process electronic health records, medical literature, and physician notes, thereby facilitating automated information extraction, diagnostic support, and decision-making.²⁹ Additionally, recurrent neural networks (RNNs) and their advanced variants (e.g., LSTM and Transformer) have demonstrated exceptional capabilities in analyzing physiological signals such as ECG and electroencephalograms (EEG), effectively identifying complex temporal patterns and supporting early disease warning systems.³⁰ For example, Thivya Anbalagan's study proposed a DNN-based approach that converts 1D ECG signals into 2D patterns via Chirplet/Stockwell transforms for identifying AF in the presence of noise and other beats. Their ensemble model (ShuffleNet + AlexNet) achieved 93.7% accuracy on the CinC 2017 dataset by leveraging spectral-temporal features, demonstrating the potential of hybrid signal processing and lightweight architectures.³¹The developed model can identify the severity of AF conditions effectively and helps reducing mortality over time due to arrhythmia.

It is worth noting that ECG feature data fundamentally belong to the category of tabular data. Using these features, traditional ML models can effectively predict NOAF. However, these shallow models cannot directly process raw ECG signals containing a wealth of complex patterns. By utilizing DL methods, it is possible to analyze these raw signals and predict AF occurrences several days in advance. Numerous researchers have proposed neural network architectures that are specifically designed to process raw ECG signal data. A summary of these methodologies is presented in Table 2.

Table 2.

Comparison of current DL-based AF prediction models for long term.

Literature	Neural network and ECG signal type	Number of records	Prediction point (from the non-AF to AF)	Performance of DL algorithms
H. Zhu et al.³²	ResNet; 12-lead ECG	49,300	30 days	AUC = 0.96
S. Rooney et al.³³	DL model incorporating convolutional and transformer layers; 12-lead ECG	84	7.5 minutes	AUC = 0.74
N. Yuan et al.³⁴	Atrous convolutional neural network; 12-lead ECG	907,858	31 days	AUC = 0.86
M. Gadaleta et al.³⁵	DL model incorporating Bidirectional LSTM and RNN layers; single-lead ECG	459,889	14 days	AUC = 0.80
P. Melzi et al.³⁶	ResNet; 12-lead ECG	26,657	6 months	AUC = 0.79
T. Habineza et al.³⁷	ResNet; 12-lead ECG	7566	40 weeks	AUC = 0.85

Based on the data presented in the table, we analyzed the performance of the DL models in the analysis of ECG signals. Firstly, we observed the performance of models using ResNet and 12-lead ECG in different studies. H. Zhu et al.'s study³² utilized 49,300 records with a predicted time frame of 30 days, achieving a high AUC value of 0.96. In contrast, P. Melzi et al.'s study,³⁶ which also employed ResNet with 12-lead ECG, had a reduced number of records at 26,657, with an extended prediction time of 6 months, resulting in a lower AUC value of 0.79. This suggests that an increase in the number of ECG raw signal samples correlates with an enhancement in the predictive performance of the model, as measured by the AUC value.

Further analysis reveals that N. Yuan et al.'s study,³⁴ with the highest number of records (907, 858), had a prediction time of 31 days and achieved an AUC value of 0.86. This is an improvement over T. Habineza et al.'s study,³⁷ which had 7566 records, a prediction time of 40 weeks, and an AUC value of 0.85, despite the latter's longer prediction horizon. Additionally, M. Gadaleta et al.'s study,³⁵ with a relatively smaller dataset of 459,889 records, had a prediction time of only 14 days and still achieved an AUC value of 0.80. These data points support the hypothesis that an increase in the number of ECG raw signal samples is associated with higher AUC values, whereas an extended prediction time does not necessarily ensure an increase in AUC values. This may be related to the complexity of the model, quality and quantity of the training data, and the generalization capabilities of the model. The longer the prediction time, the lower the AUC value, which may indicate that the predictive accuracy of the model decreases as the prediction time increases.

In summary, DL models demonstrate greater predictive accuracy when dealing with a larger number of ECG raw signal samples. However, the relationship between the length of the prediction time and the AUC value is not linear, indicating that when designing and evaluating DL models, it is necessary to consider a comprehensive range of factors, including the amount of data, the prediction time frame, and the complexity of the model's architecture.

Multimodal data integration for predicting NOAF in sepsis patients

The aforementioned ML or DL models were not specifically developed for predicting NOAF in sepsis patients, and these models were all trained using unimodal data. The integration of multimodal data plays a crucial role in the prediction of AF. By combining different types of data sources, such as ECG features, with clinical laboratory data, predictive models can more comprehensively capture the various factors that trigger AF. This fusion of data not only provides a more holistic perspective but also helps to enhance the accuracy and specificity of predictive models.

However, to date, no AI model has been able to effectively integrate ECG data with clinical laboratory data to predict AF. In the following chapters, we will explore in detail the potential application of clinical tabular data and electrophysiological data in AF prediction models for sepsis patients as well as the feasibility of DL model architectures in this field.

In the healthcare field, a review of multimodal ML approaches indicated that traditional ML methods in healthcare have primarily focused on single-modality data, which restricts their ability to integrate diverse information sources in clinical practice to improve decision-making.³⁸ In the field of disease diagnosis, research progress in multimodal models has been rapid. Although the integration of multimodal data faces challenges in precision oncology, it indicates a direction for future development. For example, by combining medical imaging, clinical records, and genomic data, a more comprehensive understanding of tumor characteristics can be achieved,³⁹ thereby enhancing predictive capabilities for risk assessment, cancer progression, and treatment response. In the assessment of Alzheimer's disease, multimodal DL models have been developed that can process a variety of clinical information, including demographic data, medical history, neuropsychological tests, neuroimaging, and functional assessments, matching the diagnostic accuracy of practicing neurologists and neuroradiologists.⁴⁰ Moreover, AI-based multimodal data differential diagnosis models have shown potential in the diagnosis of dementia etiologies as they can effectively handle and integrate heterogeneous data from different sources, such as medical imaging, clinical records, and biomarkers, to improve the accuracy and efficiency of diagnosis.⁴¹

Inflammatory factors are significantly associated with the occurrence of AF in patients with sepsis.⁴²

The systemic inflammatory response triggered by sepsis leads to the release of various proinflammatory factors. These factors can trigger a range of reactions, including the activation of the clotting pathway, which may contribute to the development of AF.⁴³ The presence of inflammatory markers is associated with atrial remodeling, which further increases the risk of AF.⁴⁴ Systemic inflammation contributes to widespread endocardial and endothelial dysfunction, promoting atrial structural and electrical remodeling, which may lead to AF.⁴² Various inflammatory biomarkers have been identified as potential predictors of AF, including C-reactive protein (CRP),^45,46 interleukins (e.g., IL-6), and red blood cell distribution width (RDW).⁴⁵ International normalized ratio (INR) has been identified as an independent predictor of NOAF in sepsis patients and serves as a key variable in NOAF prediction models.⁴⁷ Existing studies suggest that these biomarkers can serve as important features in multimodal predictive models.

Moreover, integrating ECG signals is essential for accurate prediction of AF in patients with sepsis. ECG data provide real-time insights into electrical and structural changes in the atria, allowing for the detection of early atrial remodeling signs that may precede AF. By combining ECG data with inflammatory markers, prediction models can more effectively capture the interplay between electrical activity and inflammation-driven atrial changes, enhancing the model's predictive power and specificity. This multimodal approach highlights the importance of ECG feature fusion, which may offer critical early warning signs and facilitate timely intervention in managing AF risk among sepsis patients.

Thus, it is evident that integrating tabular data and ECG signals for multimodal NOAF prediction is mechanistically feasible. To enhance the capability of ML in predicting NOAF in sepsis patients, we drew inspiration from multimodal modeling approaches used in other diseases, such as Alzheimer's disease,⁴¹ and proposed a multimodal implementation strategy for AF prediction in sepsis patients. We propose a Transformer-based multimodal model that integrates four key data modalities to predict NOAF in sepsis patients as illustrated in Figure 3.

Figure 3.

An overview diagram of a multimodal prediction model for NOAF in sepsis patients. NOAF: new-onset atrial fibrillation; ECG: electrocardiogram signals; RP: recurrence plot; Emb: embedding; MMSE: minimum mean square error; MLP: multilayer perceptron.

Tabular clinical data: It includes demographic information, laboratory results (e.g., inflammatory markers like CRP, IL-6), vital signs, and ICU scores (e.g., SOFA, APACHE II). These features capture systemic inflammation and organ dysfunction, which are mechanistically linked to atrial remodeling and NOAF. ECG Signals: Raw 12-lead ECG data provide direct insights into electrical activity and atrial abnormalities. Temporal patterns in ECG signals (e.g., P-wave morphology, QT intervals) are critical for early AF detection. ECG-Derived Images: Recurrence plots (RPs) transform 1D ECG signals into 2D representations, highlighting nonlinear dynamics (e.g., chaos, periodicity) that may precede AF onset. Textual Data: Clinical notes or reports (e.g., radiology findings) contextualize the patient's condition, such as lung injury severity, which may indirectly influence cardiac load.

In the terms of feature embedding and cross-modal fusion: Tabular data: Categorical variables (e.g., gender) are one-hot encoded, while numerical features (e.g., INR levels) are projected into embeddings via a linear layer. ECG signals: Processed using DDxNet (a pre-trained CNN) to extract high-level embeddings. ECG images: Converted into embeddings via ResNet, capturing spatial patterns in RPs. Textual data: Processed using a clinical BERT model to generate context-aware embeddings. The transformer's self-attention mechanism dynamically weights interactions between modalities.

RP is a powerful method for analyzing the periodicity, chaotic behavior, and non-stationary properties of time series, and is commonly utilized for visualizing dynamic systems.^48–51 Zhu et al.³² successfully employed the RP technique to predict the occurrence of AF up to 30 days in advance. By converting a 1D time series into a 2D recurrence matrix through recursive relationships, the RP method enhances the model's ability to identify complex patterns and improves the recognition accuracy.

The medical community has increasingly made available various multimodal datasets that integrate medical images, textual reports, and clinical annotations to advance applications in diagnosis, image segmentation, and report generation. MedTrinity-25M⁵² is a large-scale dataset released by leading institutions, containing over 25 million medical images with detailed, multi-granularity annotations to support multimodal model training and evaluation. MedPix 2.0 is an improved version of the original MedPix dataset,⁵³ providing both medical images and corresponding clinical reports and diagnostic descriptions, serving as a valuable resource for training large-scale multimodal medical models. M3D datasets⁵⁴ focus on 3D medical imaging and associated textual information, making them ideal for tasks such as image-text retrieval, report generation, and visual question answering.

In addition to these, the MIMIC-IV Clinical Database captures comprehensive clinical data-HER-including demographics, diagnoses, treatments, and laboratory tests. Complementing this, the PhysioNet Waveform Database^55,56 contains high-resolution physiological signals such as ECG waveforms. By leveraging shared patient IDs and hospitalization details, researchers can match patients’ ECG signals with their corresponding clinical records. This integrated dataset is particularly valuable for training predictive models for conditions like AF, as well as other diseases that are associated with physiological waveform signals.

Transformers require input tokens in the form of embeddings. Therefore, medical data must be adapted into a unified embedding format that the model can process. We categorized multimodal medical data into four types: imaging data compressed into embeddings via ResNet; fixed-length ECG signals (e.g., for 24-hour-ahead prediction) converted into embeddings using DDxNet⁵⁷; categorical data encoded with one-hot embeddings; and numerical data projected into the transformer's input space through a single linear layer to preserve the original data structure. ResNet and DDxNet are pre-trained classification models specifically designed for ECG signals from patients without AF and those experiencing their first AF episode, respectively, with the embeddings extracted from the final layer of each network.

Discussion

This study demonstrates that machine learning (ML) and deep learning (DL) exhibit outstanding performance in cardiovascular disease risk prediction. The reviewed ML/DL models achieved AUC values of 0.74–0.96 in predicting new-onset atrial fibrillation (NOAF), comparable to the predictive perforsepsis patientsmance (AUC 0.71–0.96) for sudden cardiac death (SCD) reported by Barker's team.⁵⁸ The AI-based electrocardiogram analysis conducted by Attia et al.⁵⁹ further substantiated the pivotal value of ECG in arrhythmia prediction. By innovatively integrating clinical tabular data with ECG signals, the developed multimodal prediction framework aligns with the trend of multi-source data fusion in medical AI. This approach echoes both the multi-data integration strategy for Alzheimer's disease diagnosis proposed by Qiu et al.⁴⁰ and the multidimensional SCD prediction concept advanced by Barker et al.⁵⁸ The findings also reveal a positive correlation between DL model performance and ECG sample size. Notably, Zhu et al.³² achieved an AUC of 0.96 using 49,300 ECG records, a finding corroborated by Goldstein et al.'s⁶⁰ analysis of extensive dialysis data (n = 22,000,000). Hannun et al.⁶¹ further demonstrated that with over 90,000 ECG samples, DL models attained diagnostic accuracy comparable to that of cardiologists, unequivocally proving the determinant role of big data in AI model performance.

This study focuses on predicting sepsis-associated new-onset atrial fibrillation (NOAF), involving systemic inflammation (e.g., CRP, IL-6)-induced atrial electrical remodeling—a pathophysiological mechanism distinct from the myocardial electrical instability and structural abnormalities underlying SCD prediction mechanisms examined by Barker et al.⁵⁸ In terms of the research population, while Attia et al.'s⁵⁹ atrial fibrillation model targeted general cases, this study emphasizes sepsis-specific pathophysiological characteristics, particularly infection-mediated myocardial injury. Methodologically, we innovatively employed recurrence plot (RP) transformation of ECG signals to capture nonlinear dynamic features, an approach novel to NOAF prediction conceptually aligned with Mathunjwa et al.'s⁴⁹ RP application in arrhythmia classification. In contrast, Tse et al.'s⁶² SCD prediction studies predominantly rely on conventional ECG parameters (e.g., QT interval), whereas our study additionally incorporates clinical text data (e.g., radiology reports). Architecturally, our Transformer-based multimodal data processing contrasts with the traditional random forest or support vector machine approaches predominantly used for SCD prediction, as reviewed by Barker et al.⁵⁸

Study limitations: While this systematic review highlights the potential of multimodal ML/DL models for predicting NOAF in sepsis patients, several limitations should be acknowledged. First, the included studies predominantly utilized retrospective data from heterogeneous sources, which may introduce biases in model generalizability. Second, the current evidence is limited by the small number of studies (n = 12) and the lack of external validation for most models, particularly those integrating multimodal data. Third, the proposed transformer-based multimodal framework, though theoretically promising, requires empirical validation in prospective clinical cohorts to assess its feasibility and performance. Additionally, the interpretability of DL models remains a challenge, potentially limiting their clinical adoption.

Future directions: While a few sepsis-specific clinical risk scores exist,⁴⁷ the application of advanced ML/DL models tailored to the unique pathophysiology of sepsis remains an under-explored area with significant potential. To enhance the reproducibility and generalizability of multimodal models, we recommend adopting the PROBAST-AI framework to standardize reporting protocols, followed by multicenter validation studies leveraging established databases such as PhysioNet (Research Resource for Complex Physiologic Signals database) and MIMIC-IV (Medical Information Mart for Intensive Care IV database). Such standardization is essential for facilitating clinical translation. Furthermore, integrating wearable technologies with temporal modeling approaches (e.g., LSTM networks) could enable continuous ECG monitoring and extended risk prediction in septic patients, potentially overcoming current limitations in temporal prediction and enabling personalized therapeutic interventions. To improve clinical utility, interpretability tools such as SHAP method⁶³ should be incorporated to elucidate model decision-making processes. Enhanced AI transparency is critical for fostering clinician trust and guiding treatment strategies.

Conclusion

This systematic review comprehensively evaluates machine learning (ML) and deep learning (DL) models for predicting new-onset atrial fibrillation (NOAF) in sepsis patients, highlighting the potential of multimodal data integration. Traditional ML models (AUC 0.74–0.90) using tabular data (e.g., demographics, lab results) or ECG-derived features offer short-term prediction (minutes to hours), while DL models (AUC 0.74–0.96) processing raw ECG signals extend horizons to days, with performance scaling with dataset size. The proposed transformer-based multimodal framework, integrating clinical data and ECG signals, offers a promising avenue to enhance predictive performance by capturing systemic inflammation and atrial remodeling dynamics. Despite limitations like retrospective data heterogeneity, the proposed framework offers valuable insights for addressing common challenges in cardiovascular AI research particularly regarding data harmonization and clinical implementation.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251384237 - Supplemental material for Prediction of new-onset atrial fibrillation in sepsis patients by machine learning: A systematic review

Supplemental material, sj-docx-1-dhj-10.1177_20552076251384237 for Prediction of new-onset atrial fibrillation in sepsis patients by machine learning: A systematic review by Shuxuan Ye, Wan Xu, Zhiyu Jiang, Yujie Zhan, Yuqiang Shen, Lan Su, Keda Yang and Bin Ju in DIGITAL HEALTH

Supplemental Material

sj-docx-2-dhj-10.1177_20552076251384237 - Supplemental material for Prediction of new-onset atrial fibrillation in sepsis patients by machine learning: A systematic review

Supplemental material, sj-docx-2-dhj-10.1177_20552076251384237 for Prediction of new-onset atrial fibrillation in sepsis patients by machine learning: A systematic review by Shuxuan Ye, Wan Xu, Zhiyu Jiang, Yujie Zhan, Yuqiang Shen, Lan Su, Keda Yang and Bin Ju in DIGITAL HEALTH

Footnotes

ORCID iD

Keda Yang

Ethical approval

Not applicable.

Contributorship

Keda Yang and Bin Ju conceived of the study idea and designed the search strategy. Shuxuan Ye and Wan Xu screened abstracts and full texts. Bin Ju and Lan Su performed the analysis and drew figures. Lan Su revised the manuscript. Yujie Zhan and Yuqiang Shen drafted the introduction, materials and methods sections. All authors cowrote and revised the manuscript for intellectual content. All authors provided their final approval for manuscript submission. All authors agree to be accountable for all aspects of the work.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Medical and Health Research Project of Zhejiang Province (2023KY1039). Higher Education Research Planning Project of the China Association of Higher Education 2024 (24YJ0403).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online.

References

Andrade

Khairy

Dobrev

, et al. The clinical profile and pathophysiology of atrial fibrillation: relationships among clinical features, epidemiology, and mechanisms. Circ Res 2014; 114: 1453–1468.

Aibar

Schulman

. New-onset atrial fibrillation in sepsis: a narrative review. Semin Thromb Hemost 2021; 47: 18–25.

Kuipers

Klein Klouwenberg

Cremer

. Incidence, risk factors and outcomes of new-onset atrial fibrillation in patients with sepsis: a systematic review. Crit Care 2014; 18: 688.

Walkey

Wiener

Ghobrial

, et al. Incident stroke and mortality associated with new-onset atrial fibrillation in patients hospitalized with severe sepsis. JAMA 2011; 306: 2248–2254.

Haug

Drazen

. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med 2023; 388: 1201–1208.

Krishnan

Singh

Pathania

, et al. Artificial intelligence in clinical medicine: catalyzing a sustainable global healthcare paradigm. Front Artif Intell 2023; 6: 1227091.

Bashar

Hossain

Ding

, et al. Atrial fibrillation detection during sepsis: study on MIMIC III ICU data. IEEE J Biomed Health Inform 2020; 24: 3124–3135.

Chung

Bazoukis

Lee

, et al. Machine learning techniques for arrhythmic risk stratification: a review of the literature. Int J Arrhythmia 2022; 23: 1–13.

Bomrah

Uddin

Upadhyay

, et al. A scoping review of machine learning for sepsis prediction- feature engineering strategies and model performance: a step towards explainability. Crit Care 2024; 28: 180.

10.

Syed

Sexton

, et al. Application of machine learning in intensive care unit (ICU) settings using MIMIC dataset: systematic review. Informatics (MDPI) 2021; 8: 1–12.

11.

Verhaeghe

De Corte

Sauer

, et al. Generalizable calibrated machine learning models for real-time atrial fibrillation risk prediction in ICU patients. Int J Med Inform 2023; 175: 105086.

12.

Karri

Kawai

Thong

, et al. Machine learning outperforms existing clinical scoring tools in the prediction of postoperative atrial fibrillation during intensive care unit admission after cardiac surgery. Heart Lung Circ 2021; 30: 1929–1937.

13.

Saeed

Villarroel

Reisner

, et al. Multiparameter intelligent monitoring in intensive care II: a public-access intensive care unit database. Crit Care Med 2011; 39: 952–960.

14.

Johnson

Pollard

Shen

, et al. MIMIC-III, a freely accessible critical care database. Sci Data 2016; 3: 160035.

15.

Johnson

AEW

Bulgarelli

Shen

, et al. Author correction: MIMIC-IV, a freely accessible electronic health record dataset. Sci Data 2023; 10: 219.

16.

Chen

Javadi

Hamilton

, et al. Quantifying deep neural network uncertainty for atrial fibrillation detection with limited labels. Sci Rep 2022; 12: 20140.

17.

Moody

Goldberger

McClennen

, et al. Predicting the onset of paroxysmal atrial fibrillation: the Computers in Cardiology Challenge 2001. In: Computers in cardiology 2001, Rotterdam, Netherlands, 2001, pp.113–116.

18.

Bashar

Ding

Walkey

, et al. Atrial fibrillation prediction from critically ill sepsis patients. Biosensors (Basel) 2021; 11: 1–20.

19.

Ortega-Martorell

Pieroni

Johnston

, et al. Development of a risk prediction model for new episodes of atrial fibrillation in medical-surgical critically ill patients using the AmsterdamUMCdb. Front Cardiovasc Med 2022; 9: 897709.

20.

Guan

Gong

Zhao

, et al. Interpretable machine learning model for new-onset atrial fibrillation prediction in critically ill patients: a multi-center study. Crit Care 2024; 28: 349.

21.

Narin

Isler

Ozer

, et al. Early prediction of paroxysmal atrial fibrillation based on short-term heart rate variability. Phys Stat Mech Its Appl 2018; 509: 56–65.

22.

Tiwari

Colborn

Smith

, et al. Assessment of a machine learning model applied to harmonized electronic health record data for the prediction of incident atrial fibrillation. JAMA Netw Open 2020; 3: e1919396.

23.

Schnabel

Sullivan

Levy

, et al. Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. Lancet 2009; 373: 739–745.

24.

Chamberlain

Agarwal

Folsom

, et al. A clinical risk score for atrial fibrillation in a biracial prospective cohort (from the Atherosclerosis Risk in Communities [ARIC] study). Am J Cardiol 2011; 107: 85–91.

25.

Nadarajah

Frangi

, et al. Predicting patient-level new-onset atrial fibrillation from population-based nationwide electronic health records: protocol of FIND-AF for developing a precision medicine prediction model using artificial intelligence. BMJ Open 2021; 11: e052887.

26.

Miotto

Wang

, et al. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 2018; 19: 1236–1246.

27.

Ranjbarzadeh

Bagherian Kasgari

Jafarzadeh Ghoushchi

, et al. Brain tumor segmentation based on deep learning and an attention mechanism using MRI multi-modalities brain images. Sci Rep 2021; 11: 10930.

28.

Aggarwal

Sounderajah

Martin

, et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit Med 2021; 4: 65.

29.

Hossain

Rana

Higgins

, et al. Natural language processing in electronic health records in relation to healthcare decision-making: a systematic review. Comput Biol Med 2023; 155: 106649.

30.

Zeynali

Seyedarabi

Afrouzian

. Classification of EEG signals using transformer based deep learning and ensemble models. Biomed Signal Process Control 2023; 86: 105130.

31.

Anbalagan

Nath

. AF Identification from time–frequency analysis of ECG signal using deep neural networks. IEEE Sens Lett 2024; 8: 1–4.

32.

Zhu

Jiang

Xia

, et al. Atrial fibrillation prediction based on recurrence plot and ResNet. Sensors (Basel) 2024; 24.

33.

Rooney

Kaufman

Murugan

, et al. Forecasting imminent atrial fibrillation in long-term electrocardiogram recordings. J Electrocardiol 2023; 81: 111–116.

34.

Yuan

Duffy

Dhruva

, et al. Deep learning of electrocardiograms in sinus rhythm from US veterans to predict atrial fibrillation. JAMA Cardiol 2023; 8: 1131–1139.

35.

Gadaleta

Harrington

Barnhill

, et al. Prediction of atrial fibrillation from at-home single-lead ECG signals without arrhythmias. NPJ Digit Med 2023; 6: 229.

36.

Melzi

Vera-Rodriguez

Tolosana

, et al. Prediction of atrial fibrillation from sinus-rhythm electrocardiograms based on deep neural networks: analysis of time intervals and longitudinal study. IRBM 2023; 44: 100811.

37.

Habineza

Ribeiro

Gedon

, et al. End-to-end risk prediction of atrial fibrillation from the 12-lead ECG by deep neural networks. J Electrocardiol 2023; 81: 193–200.

38.

Krones

Marikkar

Parsons

, et al. Review of multimodal machine learning approaches in healthcare. Inf Fusion 2025; 114: 102690.

39.

Zhou

Zhao

, et al. Multimodal data integration for precision oncology: challenges and future directions. 2024.

40.

Qiu

Miller

Joshi

, et al. Multimodal deep learning for Alzheimer's disease dementia assessment. Nat Commun 2022; 13: 3404.

41.

Xue

Kowshik

Lteif

, et al. AI-based differential diagnosis of dementia etiologies on multimodal data. Nat Med 2024; 30: 2977–2989.

42.

Korantzopoulos

Letsas

Tse

, et al. Inflammation and atrial fibrillation: a comprehensive review. J Arrhythm 2018; 34: 394–401.

43.

Ihara

Sasano

. Role of inflammation in the pathogenesis of atrial fibrillation. Front Physiol 2022; 13: 862164.

44.

Quesada-Ocete

Castillo Martinez

Garcia Gonzalez

, et al. Role of inflammation in atrial fibrillation: specific biomarkers of tissue damage, atrial remodelling and dysregulation of angiogenesis. EP Europace 2024; 26: 1252–1253.

45.

Valenti

Vitolo

Imberti

, et al. Red cell distribution width: a routinely available biomarker with important clinical implications in patients with atrial fibrillation. Curr Pharm Des 2021; 27: 3901–3912.

46.

Friedrichs

Klinke

Baldus

. Inflammatory pathways underlying atrial fibrillation. Trends Mol Med 2011; 17: 556–563.

47.

Pang

, et al. Development and validation of a predictive model for new-onset atrial fibrillation in sepsis based on clinical risk factors. Front Cardiovasc Med 2022; 9: 968615.

48.

Mathunjwa

Lin

, et al. ECG Arrhythmia classification by using a recurrence plot and convolutional neural network. Biomed Signal Process Control 2021; 64: 102262.

49.

Mathunjwa

Lin

, et al. ECG Recurrence plot-based arrhythmia classification using two-dimensional deep residual CNN features. Sensors (Basel) 2022; 22: 1660.

50.

Labib

Nahid

. OptRPC: a novel and optimized recurrence plot-based system for ECG beat classification. Biomed Signal Process Control 2022; 72: 103328.

51.

Gao

Yan

Gao

, et al. Automatic detection of epileptic seizure based on approximate entropy, recurrence quanti fi cation analysis and convolutional neural networks. Artif Intell Med 2020; 102: 101711.

52.

Xie

Zhou

Gao

, et al. MedTrinity-25M: a large-scale multimodal dataset with multigranular annotations for medicine. 2024.

53.

Siragusa

Contino

Ciura

, et al. MedPix 2.0: a comprehensive multimodal biomedical data set for advanced AI applications with retrieval augmented generation and knowledge graphs. Data Sci Eng 2025.

54.

Bai

Huang

, et al. M3D: advancing 3D medical image analysis with multi-modal large language models. In: ICLR 2025 conference, 2024.

55.

Clifford

Liu

Moody

, et al. Classification of normal/abnormal heart sound recordings: the PhysioNet/computing in cardiology challenge 2016. In: 2016 computing in cardiology conference (CinC), Vancouver, BC, Canada, 2016, pp.609–612.

56.

Clifford

Liu

Moody

, et al. AF classification from a short single lead ECG recording: the PhysioNet/computing in cardiology challenge 2017. Comput Cardiol (2010) 2017; 44: 1–11.

57.

Thiagarajan

Rajan

Katoch

, et al. DDxnet: a deep learning model for automatic interpretation of electronic health records, electrocardiograms and electroencephalograms. Sci Rep 2020; 10: 16428.

58.

Barker

Khavandi

, et al. Machine learning in sudden cardiac death risk prediction: a systematic review. Europace 2022; 24: 1777–1787.

59.

Attia

Noseworthy

Lopez-Jimenez

, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 2019; 394: 861–867.

60.

Goldstein

Chang

Mitani

, et al. Near-term prediction of sudden cardiac death in older hemodialysis patients using electronic health records. Clin J Am Soc Nephrol 2014; 9: 82–91.

61.

Hannun

Rajpurkar

Haghpanahi

, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med 2019; 25: 65–69.

62.

Tse

Zhou

Lee

, et al. Incorporating latent variables using nonnegative matrix factorization improves risk stratification in Brugada syndrome. J Am Heart Assoc 2020; 9: e012714.

63.

Liu

, et al. Predicting mortality in intensive care unit patients with heart failure using an interpretable machine learning model: retrospective cohort study. J Med Internet Res 2022; 24: e38082.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

0.21 MB