Abstract
Objective
Sepsis-associated acute kidney injury (SA–AKI) is a major contributor to multi-organ failure and often leads to complications such as new-onset atrial fibrillation (NOAF). NOAF is associated with poorer outcomes, including increased mortality. However, current methods for predicting NOAF in patients with SA–AKI remain limited.
Methods
This retrospective cohort study used data from the MIMIC-IV database to identify 12,956 adult patients with SA-AKI, among whom 2,708 developed NOAF. Machine learning (ML) techniques, including Boruta feature selection and nine predictive algorithms, were applied to identify key predictors and develop forecasting models for NOAF. Model performance was evaluated using metrics such as area under the curve (AUC), accuracy, sensitivity, specificity, and F1 score. SHapley Additive exPlanations (SHAP) values were used to enhance model interpretability and identify the most influential predictors.
Results
XGBoost demonstrated the best predictive performance, achieving an AUC of 0.83. The top predictors included age, creatinine, mean blood pressure, congestive heart failure, temperature, and anion gap. SHAP analysis confirmed the significant impact of these factors on NOAF risk. The model was further optimized by retaining eight key variables, ensuring strong predictive performance while enhancing practical applicability. A web-based platform was developed for real-time risk assessment.
Conclusions
This study presents a robust and interpretable ML model for predicting NOAF in patients with SA–AKI. By identifying critical risk factors, the model may assist clinicians in implementing timely interventions to improve patient outcomes. Further multicenter validation is required to confirm these findings and refine risk prediction across diverse patient populations.
1. Introduction
Sepsis is a life-threatening syndrome that often precipitates multi-organ dysfunction, including acute kidney injury (AKI). 1 New-onset atrial fibrillation (NOAF) is the most common arrhythmia in the intensive care unit (ICU), 2 with its incidence varying based on illness severity—reported in approximately 2% to 40% of septic ICU patients. 3 Patients with sepsis-associated acute kidney injury (SA–AKI) are particularly susceptible to NOAF. 4 For instance, among critically ill patients with severe AKI requiring dialysis, around 37% develop NOAF during their ICU stay. 5 The occurrence of NOAF in sepsis is not benign—it is increasingly recognized as a marker of heightened disease severity, essentially an additional organ failure in septic shock. 6 Studies consistently demonstrate that sepsis patients who develop NOAF have worse outcomes, including prolonged ICU stays, higher risks of stroke and heart failure, and elevated short-term and long-term mortality. 7 These adverse prognostic implications underscore the clinical significance of NOAF in the context of sepsis and AKI.
However, predicting NOAF in SA–AKI remains a formidable challenge. 8 Numerous risk factors for NOAF have been proposed—such as advanced age, severity of infection, inflammation, and electrolyte disturbances—but translating these disparate factors into a practical prediction tool has proven difficult.7,9 Currently, most studies on NOAF prediction have focused on postoperative cardiac surgical patients or the general ICU population,10,11 whereas studies specifically targeting septic patients in the ICU, particularly those with AKI, are relatively scarce. In real-world ICU practice, many septic patients who develop NOAF (often in the setting of multi-organ failure like AKI) do not receive prompt cardiologic evaluation or prophylactic interventions. 12 This gap in care highlights the limitations of current risk stratification approaches and the pressing need for more accurate and actionable predictive models for NOAF in SA–AKI.
Machine learning (ML) has emerged as a promising approach to improve risk prediction in critical illness. 13 In particular, ensemble tree-based models such as Random Forest (RF) and Extreme Gradient Boosting (XGBoost) have shown impressive performance in modeling complex clinical data. 14 These ML models have been widely applied in ICU prognostication tasks and have achieved satisfying predictive performance in identifying patients at risk of adverse events. 15 Preliminary studies have also explored the application of machine learning in AF risk prediction, with models developed for AF prediction post-cardiac surgery patients. 16 However, there remains a lack of robust predictive models specifically targeting NOAF in septic patients with AKI. This represents a significant gap in the current predictive landscape and underscores the need for dedicated research to address this unmet clinical challenge.
In this study, we aim to develop an interpretable machine learning model using MIMIC database data to predict NOAF in patients with SA–AKI, addressing a key gap in critical care cardiology. Our goal is to deploy the model on a web-based platform, providing real-time access for clinicians. This platform will help identify patients at risk for NOAF and guide monitoring or preventive strategies. We will also demonstrate how interpretable ML techniques, like SHapley Additive exPlanations (SHAP), can bridge the gap between predictive analytics and clinical decision-making.
2. Materials and methods
2.1. Data source
MIMIC-IV (version 3.1) is a publicly available, de-identified clinical database widely utilized in critical care research. It includes detailed electronic health records from patients admitted to the intensive care units of Beth Israel Deaconess Medical Center. The database provides a wealth of data, including demographic information, vital signs, laboratory test results, medication records, and documented diagnoses. In this study, Ge (ID: 13547277), one of the authors, fully adhered to data use agreements and institutional guidelines for MIMIC-IV and was responsible for the data extraction process.
Patients were selected from the MIMIC-IV database. A total of 45,599 adult patients with sepsis, defined according to Sepsis 3.0 criteria, were initially identified upon ICU admission. After excluding 10,645 patients with multiple ICU admissions, 10,788 with ICU stays of less than 48 hours, and 6,528 without AKI, a total of 17,638 patients with SA-AKI remained. An additional 4,682 patients with a prior history of AF or AF occurring on the first ICU day were excluded. The final study cohort consisted of 12,956 SA-AKI patients, who were subsequently divided into two groups: 10,248 in the Non-NOAF group and 2,708 in the NOAF group (Figure 1). All patient details were de-identified to ensure that individual patients could not be identified in any way. This retrospective cohort study was conducted in accordance with the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement.
17
Flowchart of cohort selection from the MIMIC-IV database.
2.2. Data collection
We collected comprehensive demographic, clinical, and laboratory information from patients admitted to the ICU. Specifically, the variables included age, gender, BMI, and race, as well as clinical scores such as the Sequential Organ Failure Assessment (SOFA), Acute Physiology Score III (APS III), Oxford Acute Severity of Illness Score (OASIS), Glasgow Coma Scale (GCS), SIRS, and the Charlson Comorbidity Index. Vital signs, including heart rate, blood pressure (systolic and mean), respiratory rate, temperature, oxygen saturation (SpO2), and total CO2 levels, were also recorded. Laboratory tests encompassed white blood cell count (WBC), red blood cell count (RBC), platelets, electrolytes (potassium, sodium, calcium, chloride, magnesium, phosphate), glucose, urine output, international normalized ratio (INR), prothrombin time (PT), partial thromboplastin time (PTT), pH, base excess (BE), anion gap, blood urea nitrogen (BUN), and creatinine. Treatments such as mechanical ventilation (MV), continuous renal replacement therapy (CRRT), epinephrine, vasopressin, and neuromuscular blockade were also included, along with outcomes like hospital length of stay, ICU length of stay, and in-hospital mortality.
In this study, mean arterial pressure (MAP), referred to as MBP, was extracted from the MBP field in the MIMIC-IV vital sign table. According to the MIMIC-IV data dictionary, this field may include measurements obtained from invasive arterial catheters or non-invasive oscillo metric cuffs. MAP can be directly input from invasive arterial line measurements in ICU patients, whereas for patients without arterial lines, MAP can be estimated from non-invasive cuff measurements using the standard formula: MAP ≈ DBP + 1/3 × (SBP − DBP).
All data were extracted using PostgreSQL software, focusing on measurements and treatments recorded during the first 24 hours following ICU admission. Variables with more than 30% missingness were excluded from the analysis. For the remaining variables, missing values were handled using multiple imputation by chained equations (MICE). Continuous variables were imputed using predictive mean matching, while categorical variables were imputed using logistic regression. To prevent information leakage, imputation was performed separately for the training and testing datasets. Five imputed datasets were generated for each set, and analyses were conducted on each dataset, with the results pooled according to Rubin’s rules to account for imputation uncertainty.
2.3. Definition and clinical results
Sepsis was defined according to Sepsis-3.0 criteria as a suspected or documented infection accompanied by an acute increase in SOFA score of ≥2. 18 AKI was identified using KDIGO guidelines: an increase in serum creatinine of ≥0.3 mg/dL within 48 hours, ≥1.5 times the baseline within 7 days, or urine output <0.5 mL/kg/h for at least 6 hours. 19 SA-AKI was defined as AKI occurring within 7 days of sepsis onset. 20 The primary outcome was the development of NOAF after the first ICU day. NOAF was determined based on bedside cardiac rhythm documentation recorded by ICU nurses in the electronic medical records. 2
2.4. Statistical analysis
Continuous data were summarized as either the mean ± standard deviation or the median with interquartile range, depending on their distribution. Categorical data were reported as proportions. The Kolmogorov-Smirnov test was employed to evaluate the normality of continuous variables. For comparisons, t-tests or ANOVA were utilized when the data followed a normal distribution, whereas the Mann-Whitney U test or the Kruskal-Wallis test was applied for non-normally distributed variables.
We conducted stratified sampling based on the outcome event, dividing the original dataset into a 7:3 split for training and test sets. The Boruta algorithm was applied to the training set to select significant predictors.
We trained nine models on the training set, including Logistic Regression (LR), Decision Tree (DT), Ridge Regression (Ridge), Elastic Net (ENet), k-Nearest Neighbors (KNN), RF, XGBoost, Support Vector Machine (SVM), and Multilayer Perceptron (MLP). During training, hyperparameters were tuned on the training set using five-fold cross-validation with grid search. The performance of the models was assessed using several metrics, including the area under the receiver operating characteristic curve (AUC), decision curves, calibration curves, accuracy, sensitivity, specificity, recall, and the F1 score. Following the selection of the best-performing model, we applied SHAP values to interpret the model’s predictions and generated Clinical Impact Curve (CIC) for further evaluation.
All analyses used R software (version 4.4.2), with significance set at a two-sided P < 0.05.
3. Results
3.1. Baseline characteristics
Comparison of the characteristics between the NOAF and Non-NOAF groups.
SOFA: Sequential organ failure assessment, GCS: Glasgow Coma Scale, APS III: Acute Physiology Score III, OASIS: Oxford Acute Severity of Illness Score SpO2: Oxygen saturation, SBP: Systolic blood pressure, MBP: Mean Blood Pressure, WBC: White blood cell count, RBC: Red blood cell count, Platelet: Platelet count, AKI: Acute kidney injury, INR: International normalized ratio, BE: Base Excess, BUN: blood urea nitrogen, MV: Mechanical Ventilation, CRRT: Continuous renal replacement therapy. Bold indicates statistical significance
3.2. Predictor selection
Baseline characteristics of the training and validation sets.
SOFA: Sequential organ failure assessment, GCS: Glasgow Coma Scale, APS III: Acute Physiology Score III, OASIS: Oxford Acute Severity of Illness Score SpO2: Oxygen saturation, SBP: Systolic blood pressure, MBP: Mean Blood Pressure, WBC: White blood cell count, RBC: Red blood cell count, Platelet: Platelet count, AKI: Acute kidney injury, INR: International normalized ratio, BE: Base Excess, BUN: blood urea nitrogen, MV: Mechanical Ventilation, CRRT: Continuous renal replacement therapy. Bold indicates statistical significance.

Feature selection using the boruta algorithm. Green indicates confirmed important features, and red indicates unimportant features.
3.3. Establishment and validation of the prediction model
These variables were then used to train the nine predefined models. During model training, hyperparameter tuning was performed on the training set using grid search (Supplemental Table S1). The AUC on the test set for each model was as follows: XGBoost 0.8336 (95% CI: 0.8222–0.8511), RF 0.8199 (95% CI: 0.8042–0.8356), MLP 0.8184 (95% CI: 0.8029–0.8339), SVM 0.7938 (95% CI: 0.7766–0.8110), Logistic 0.7899 (95% CI: 0.7734–0.8064), Ridge 0.7885 (95% CI: 0.7719–0.8051), ENet 0.7885 (95% CI: 0.7719–0.8051), DT 0.7847 (95% CI: 0.7675–0.8018), and KNN 0.7214 (95% CI: 0.7019–0.7409) (Figure 3). ROC curves for nine machine learning models in the training set (a) and test set (b).
Evaluation of machine learning model performance in NOAF prediction.

(a) Calibration curves for nine models in the test set; (b) clinical impact curve (CIC) for the XGBoost model; (c) decision curve analysis (DCA) for all models in the test set.
The CIC for the XGBoost model displays the number of high-risk patients identified per 1,000 individuals at varying thresholds. The red solid line shows the total number of individuals classified as high-risk, while the blue dashed line indicates the subset who actually experience the event of interest. As thresholds rise, the number of identified high-risk individuals decreases, illustrating the balance between sensitivity and specificity. The threshold associated with the optimal Youden Index of 0.50 was selected as the reference point for applying the model (Figure 4).
3.4. Explanation analysis
In our XGBoost model for predicting NOAF, SHAP values were used to identify and rank the most influential predictors. The top eight important variables were age, creatinine, MBP, congestive heart failure, temperature, anion gap, BUN, and SBP. These variables contributed the most to the model’s output, providing insight into the factors associated with the NOAF in critically ill patients (Figure 5). SHAP interpretation. (a) SHAP summary plot showing global feature importance; (b) SHAP decision plot illustrating individual prediction pathways.
3.5. Real-world clinical implementation
To enhance clinical applicability, computational efficiency, and ease of deployment, we refined our predictive model by selecting a more concise set of variables. Using the top eight most important variables identified by SHAP values from the XGBoost model, we retrained the model. This streamlined approach maintained strong predictive performance (AUC = 0.80) while improving real-time usability and integration into clinical workflows. To facilitate real-world clinical implementation, we deployed the model as a web-based decision support tool at https://doctorge.shinyapps.io/NOAF/. This platform enables clinicians to rapidly assess atrial fibrillation risk, supporting timely risk stratification, preventive measures, and personalized management. To illustrate its practical application, a hypothetical patient example was created to demonstrate how the web-based tool can guide clinical decision-making (Figure 6). Illustration of the web-based predictive tool for new-onset atrial fibrillation (NOAF) in patients with SA–AKI. Users can input clinical variables, including age, systolic blood pressure (SBP), mean blood pressure (MBP), temperature, anion gap, blood urea nitrogen (BUN), creatinine, and presence of congestive heart failure.
4. Discussion
This is the first study to develop a predictive model for NOAF in patients with SA–AKI. Nine machine learning models were used to develop predictive tools by evaluating 24 clinical variables collected within the first 24 hours of ICU admission. Among these, the XGBoost model demonstrated excellent discrimination, reliable calibration, and considerable potential for clinical application. Validation results confirmed the model’s accuracy and robustness. To interpret the model’s outputs, SHAP analysis was applied, revealing that key predictors such as age, creatinine, MBP, congestive heart failure, temperature, anion gap, BUN, and SBP played the most important roles in prediction. The SHAP summary plot provided a clearer understanding of the model’s prediction process, allowing for more transparent and interpretable risk assessment.
In our study, we developed an XGBoost-based predictive model for NOAF in SA–AKI patients, achieving an AUC of 0.83, which outperformed those reported in previous studies. Research A introduced a clinical prediction model for NOAF in post-percutaneous coronary intervention acute myocardial infarction patients, identifying triglyceride-glucose index, left atrial diameter, age, systemic inflammation response index, and creatinine as key risk factors. However, its reliance on linear regression and an AUC of 0.78 limited its predictive strength. 7 Jarne Verhaeghe et al. developed CatBoost models for NOAF prediction with an AUC of 0.81, yet also did not deploy a web interface for clinical use. 21 Notably, neither study addressed NOAF prediction in SA–AKI patients, highlighting the novelty of our approach and its superior performance metrics.
The association between SA–AKI and NOAF involves intricate interactions between inflammation, metabolic disturbances, and hemodynamic instability. By leveraging SHAP values, we identified age, creatinine, MBP, congestive heart failure, temperature, anion gap, BUN, and SBP as the most influential variables. Each of these factors may contribute to NOAF through distinct mechanisms that reflect the broader pathophysiological landscape of sepsis and AKI. First, the elevation of creatinine and BUN highlights the profound renal impairment in these patients. The accumulation of uremic toxins, which can directly or indirectly alter myocardial electrophysiology, may disrupt the cardiac action potential. 22 Uremic toxins have been implicated in increasing myocardial inflammation, promoting fibrosis, and altering calcium handling—all of which can heighten susceptibility to arrhythmias.23,24 Age, identified as the most important predictor, reflects the cumulative cardiovascular and systemic vulnerability associated with advanced years, which may predispose patients to arrhythmogenesis. Changes in anion gap indicate significant acid-base disturbances. 25 Acidosis or alkalosis may modulate the function of pH-sensitive ion channels, shift the equilibrium of calcium ions, and affect cellular excitability, thereby contributing to arrhythmogenic substrates.26,27 Systemic temperature elevation and mean blood pressure instability are additional indicators of severe sepsis-related stress.28,29 Hyperthermia, a hallmark of sepsis, increases metabolic demand and may affect ion channel kinetics, particularly those responsible for depolarization and repolarization. 30 Meanwhile, hemodynamic instability reflected in mean blood pressure variation can compromise myocardial perfusion, trigger ischemic injury, and cause further electrical remodeling. 31 These processes exacerbate the electrical heterogeneity needed for NOAF initiation and maintenance. 32 Magnesium level provides further insight into electrolyte and metabolic perturbations. 33 Hypomagnesemia has long been recognized as a critical factor in increasing atrial excitability and reducing electrical stability. 34 Magnesium depletion can impair sodium-potassium ATPase activity, prolong action potential duration, and facilitate early afterdepolarizations. 35 On the other hand, anion gap, a marker of systemic metabolic balance, may influence intracellular pH, ion channel activity, and the overall electrophysiological environment, further predisposing patients to arrhythmias.36,37 By identifying these variables, we begin to uncover the multifactorial pathways linking SA–AKI to NOAF. The integration of SHAP analysis into the predictive model not only highlights key contributors but also suggests that interventions targeting these metabolic, hemodynamic, and electrolyte disturbances may mitigate the risk of NOAF in this critically ill population.
Our study demonstrates that predicting NOAF in patients with SA–AKI holds significant clinical value. Early identification of high-risk patients enables clinicians to implement timely and focused interventions, such as enhanced cardiac monitoring, preventive measures, and more rigorous management of fluid balance, electrolyte levels, and acid-base status. Such measures can reduce complications, stabilize hemodynamics, and ultimately improve patient outcomes. Notably, we have successfully deployed this predictive model on an intuitive web-based platform, allowing clinical professionals to access it conveniently and obtain real-time risk assessments. By doing so, we bridge the gap between cutting-edge predictive technology and everyday clinical practice, making data-driven risk assessment a practical component of routine care. This approach facilitates proactive management, earlier intervention, and better long-term outcomes for critically ill patients with SA–AKI.
Our study has several limitations. The absence of key cardiac biomarkers, such as BNP and left ventricular ejection fraction, is a significant drawback. These variables are crucial for understanding the cardiac structural and functional changes that influence NOAF risk. Additionally, the retrospective nature of data collection may introduce documentation errors or inconsistencies, potentially impacting the reliability of the findings. Additionally, all predictors were averaged over the first 24 hours of ICU admission, which may obscure important temporal changes, such as worsening renal function or hemodynamic instability. Future studies using time-varying or longitudinal modeling could improve predictive performance. A further limitation of this study is its single-center design, which may constrain the generalizability of the findings, as external validation has not been conducted and model performance could differ across diverse healthcare settings and patient populations. These considerations highlight the imperative for future large-scale, multicenter, and multi-ethnic prospective studies to validate our results, enhance risk stratification models, and address existing gaps in clinical data.
5. Conclusion
This study developed an interpretable machine learning model to predict NOAF in patients with sepsis-associated acute kidney injury. The model showed promising predictive performance and identified several clinically relevant predictors. These findings may help support early risk stratification in this population. Further external validation, particularly through multicenter prospective studies, is needed before potential clinical application.
Supplemental material
Supplemental material - Development of an interpretable machine learning model for predicting new-onset atrial fibrillation in patients with sepsis-associated acute kidney injury: A retrospective cohort study
Supplemental material for Development of an interpretable machine learning model for predicting new-onset atrial fibrillation in patients with sepsis-associated acute kidney injury: A retrospective cohort study by Yuanshuo Ge, Guangdong Wang, Linlin Zhang, Yang Miao, Hui Wu, Ye Hu and Cunlin Yin in Science Progress.
Footnotes
Ethical considerations
The study was approved by the Institutional Review Boards (IRB) of the Massachusetts Institute of Technology (MIT) and Beth Israel Deaconess Medical Center (BIDMC). The requirement for individual informed consent was waived due to the retrospective and observational nature of the study. The study complied with the ethical standards of the Declaration of Helsinki.
Author contributions
YG: Data curation, Formal analysis, Methodology, Writing – original draft. GW and YH: Writing – review & editing. LZ and YM: Data curation. HW: Methodology. CY: Conceptualization, Supervision, Writing – review & editing. All authors read and approved the final draft.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors affirm that the study was carried out without any commercial or financial affiliations that might be perceived as a potential conflict of interests.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
