Abstract
Background:
Atrial fibrillation (AF) in the elderly population is projected to increase over the next several decades. Catheter ablation shows promise as a treatment option and is becoming increasingly available. We examined 90-day hospital readmission for AF patients undergoing catheter ablation and utilized machine learning methods to explore the risk factors associated with these readmission trends.
Methods:
Data from the 2013 Nationwide Readmissions Database on AF cases were used to predict 90-day readmissions for AF with catheter ablation. Multiple machine learning methods such as k-Nearest Neighbors, Decision Tree, and Support Vector Machine were employed to determine variable importance and build risk prediction models. Accuracy, precision, sensitivity, specificity, and area under the curve were compared for each model.
Results:
The 90-day hospital readmission rate was 17.6%; the average age of the patients was 64.9 years; 62.9% of patients were male. Important variables in predicting 90-day hospital readmissions in patients with AF undergoing catheter ablation included the age of the patient, number of diagnoses on the patient’s record, and the total number of discharges from a hospital. The k-Nearest Neighbor had the best performance with a prediction accuracy of 85%. This was closely followed by Decision Tree, but Support Vector Machine was less ideal.
Conclusions:
Machine learning methods can produce accurate models in predicting hospital readmissions for patients with AF. The likelihood of readmission to the hospital increases as the patient age, total number of hospital discharges, and total number of patient diagnoses increase. Findings from this study can inform quality improvement in healthcare and in achieving patient-centered care.
Keywords
Introduction
Atrial fibrillation (AF) is growing in prevalence and is costly. Being the most common cardiac rhythm disorder, 1 it is estimated to impact 33.5 million individuals across the world, 2 with the number of cases projected to increase exponentially over the next several decades. 3 In the United States, the number of AF cases is expected to double by 2050. 4 The rapid increase is likely attributed to the growing elderly population in the world 3 as AF is often associated with the aging process. 5 The rapid rise in AF cases also lead to increased medical cost and resulted in a public health crisis. The annual cost for AF treatments in the United States was estimated at 6.65 billion in 2006 6 and is expected to increase quickly over the next decades.
AF is a common cardiac arrhythmia with chaotic electrical activity showing in the atria, causing symptoms such as palpitations, shortness of breath, effort intolerance and fatigue, 2 and is associated with an increase in morbidity and mortality from heart failure, stroke, cognitive impairment 7 and other thromboembolic conditions. 8 Such conditions have contributed to lower quality of life in AF patients compared to the general population and other patients with coronary heart diseases. 1,9 Cather ablation is an increasingly widespread method of treatment for atrial fibrillation and has shown good outcomes. 10 Using radio frequency or cryotherapy to electrically isolate the pulmonary veins and ablate arrhythmia foci 11 during catheter ablation have demonstrated improvement in atrial fibrillation-related symptoms and enhancement in health-related quality of life (HQoL). 2 Additionally, ablation also has measurable positive effects on risk of death, stroke, and dementia 8 and is more effective in than anti-arrhythmic medications. 12
To improve healthcare quality while also reducing healthcare costs, the Centers for Medicaid and Medicare Services (CMS) have developed the Hospital Readmission Reduction Program (HRRP), which penalizes healthcare providers that have high hospital readmission rates. 13,14 Since the implementation of HRRP, readmission rates have been reduced by approximately 1% 15 ; however, this may not be sufficient to prove that the HRRP program has caused a decline in hospital readmissions. More studies are needed. Additionally, with 2,592 out of 5,627 hospitals penalized in 2015 in the United States, the overall hospital readmission rate remains high. 13 Comorbid AF conditions, such as heart attack and heart failure, are among the predominant hospitalization diagnoses being penalized by HRRP. Reducing HRRP penalties relies strongly upon understanding the reasons behind hospital readmissions of AF patients. This understanding is also critical for minimizing the rising healthcare costs incurred from raising AF cases.
Hospital readmission rate for patients with AF undergoing catheter ablation was reported to be approximately 10%. 11 Age, sex, primary payer, heart failure, hypertension, chronic renal disease, lung disease, and the number of AF hospitalizations during the prior years were significant predictors for 30-day hospital readmission. 11 While readmission rates for AF patients (10%) 16 are comparatively lower than those with other conditions affected by HRRP penalties such as acute myocardial infarction (20%), heart failure (25%), and pneumonia (18%), 17 the readmissions rates for AF patients undergoing catheter ablation was quite high at 16.5%. 18 Even though CMS currently only tracks 30-day readmissions, it does not mean that 90-day readmissions are not important. It may mean that CMS wants to see studies evaluating the 90-day readmission rates in order to make an informed decision. Since past research has demonstrated that hospital readmissions during the first 90-days are actually very common, 19 but 90-day hospital readmission predictors were less widely studied, there is a knowledge gap in the literature. Thus, there is a strong need for carrying out this study to examine 90-day hospital readmission in AF.
Compared to the general population, AF patients are 3 times more likely to undergo multiple hospitalizations and they spend 73% more annually in direct medical costs, including Medicare payments. 1,20 As expected AF cases will rise within the next few decades, so does the urgency to understand AF’s risk factors and to create accurate models to predict AF. A clearer comprehension of AF will alleviate the impending economic and public health burden.
Past research regarding hospital readmissions have typically utilized traditional hypothesis-driven statistical techniques to identify the causal factors, which rely heavily on assumptions and have many limitations when the data are large. 21,22 For example, traditional linear regression assumes homoscedasticity, independence of observations, normally distributed errors for each dependent variable’s value and linear relationship between dependent and independent variables, etc. These assumptions are often very difficult to meet for traditional hypothesis-testing methods, especially when there exists a large number of variables and cases. Hospital readmission data typically consist of a large number of variables and cases and are susceptible to the limitations imposed by traditional hypothesis-driven techniques. However, machine learning is both an innovative and efficient method that allows a large amount of data to be processed efficiently without relying on traditional assumptions. This study aimed to use machine learning methods to develop prediction models of 90-day hospital readmissions for AF patients undergoing catheter ablation.
Methods
Data
The 2013 cycle of the Nationwide Readmissions Database (NRD) provided the raw data used in this study. The NRD was developed for the Healthcare Cost and Utilization Project (HCUP) to addresses the lack of nationally representative data on hospital readmissions for patients of all ages. It used HCUP State Inpatient Databases (SID) and the corresponding verified patient numbers to track patients within participating states a while following strict privacy guidelines. It included inpatients treated and discharged at community hospitals that were not rehabilitation or long-term acute care facilities. The 2013 NRD was created from 21 SID that contained geographically diverse information and contained 49.3% of the total population and 49.1% of all hospitalizations in the United States. Detailed information on NRD can be found at the HCUP website. 23
Outcome
The main outcome for this study was the 90-day hospital readmissions status. The NRD defined an index event as the starting point for analyzing repeat hospital visits, while hospital readmission was a subsequent inpatient admission within a specified period of time. Thus, 90-day hospital readmission was defined as the index admission that had at least one readmission within 90 days after hospital discharge. It was a dichotomous variable with 1 representing a patient had one or more readmissions within 90 days after discharge and 0 otherwise. Further, as defined by CMS, to be considered as readmission, patients had to be readmitted to the same hospital or another applicable acute care hospital.
Demographics
Age, diagnosis, number of unique chronic conditions, patients’ length of hospital stay, procedures reported for a patient on their discharge, gender, income, and primary payer were the demographic variables included in this study. Both weighted and unweighted prevalence estimates were calculated for the demographic characteristics. To compute the weighted demographic descriptive, the R 4.0.0 software Survey package was utilized. Clusters, stratum, and weights were incorporated into the data to produce nationally representative results.
Data Processing
Patients were identified via the primary diagnosis code of AF (427.31) using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM), 24 and the primary or secondary procedure code of catheter ablation as 37.34. Patients who died during hospitalization, or were under the age of 18 years old, or had missing data on the length of hospital stay were excluded. For the 90-day hospital readmission status, patients discharged after September were excluded in order to allow for 90-day follow-up before the year ended in 2013. Patients having the following secondary diagnoses were excluded: atrial flutter, paroxysmal supraventricular tachycardia, atrioventricular nodal tachycardia, Wolff-Parkinson-White syndrome, paroxysmal ventricular tachycardia and ventricular premature beats. 11,18 Additional exclusion criteria were patients with diagnosis or procedural codes showing prior or current implantation of pacemaker or implantable cardioverter-defibrillator and patients with open surgical ablations. 11,18
In order to prepare the data for variable selection, additional data processing was performed. Irrelevant variables such as patient IDs, key identifiers, and weighting variables were excluded. Variables with all cases missing were deleted. To ensure that the scales were consistent with all other variables, age and total hospital discharges were standardized. To prepare the data for machine learning, resampling methods were applied to the readmitted cases in order to adjust imbalanced data. Categorical data such as hospital bed size and discharge quarter were dummy coded to prevent the classifiers from incorrectly interpreting the variables as continuous data.
Variable Selection
With nearly 2,000 variables contained in the NRD, conducting variable selection to select a subset of top predictors was needed and it could provide reduce computer storage requirements, machine learning model training times, and data dimensionality, which can lead to improved model performance. 25 Top predictor variables were chosen based on relative variable importance computed using random forest. Random forest is a well-established tree method for variable selection. It works by identifying a small number of relevant predictors that can produce a more parsimonious model but has a similar predictive performance of a logistic model. 26 Using random forest, the top 30 features (i.e., variables) were identified and ordered by their predictive performance. The 30 features were subsequently narrowed down into a simpler model with the top 6 features that had relatively high variable importance, which were then input into the machine learning to produce risk prediction models.
Machine Learning Algorithms
In conventional statistical approaches, a model is built and then input into a machine (e.g., computer). 27 This conventional approach is model-driven and heavily relies on assumptions about the shape of the data and these assumptions may be prone to bias and error. On the other hand, machine learning provides a data-driven approach in analyzing data. Machine learning inputs the data directly into the machine instead of imposing assumptions on the model itself. The goal of the machine is then to perform pattern recognition in order to “learn” and output a model observed in the data. 27 Such a data-driven approach is particularly efficient for analyzing large complex data such as those of hospital readmissions data, genomic data, imaging data or stock market data where patterns can be difficult to discern. Machine learning has great potential and implication in the public health for identification of healthcare needs as well as for crisis prediction and prevention. 28
We used supervised machine learning approaches for model development which included k-nearest neighbors (k-NN), support vector machine (SVM), and decision tree classifier. Supervised machine learning was chosen because the outcome of interest had already been identified (e.g., hospital readmission status of the patients). 29 K-NN, SVM, and decision tree are some of the most well-known and well-used machine learning methods to apply classification algorithms. Decision tree provides advantages of efficiency and flexibility that can lead to performance improvements, and it is used in a wide array of areas such as medical diagnosis, remote sensing, and speech recognition. 30 K-NN is widely used for pattern classification, and is very effective when the probability distribution of the input variables are unknown since it does not make probability assumption of the variables. 31 Because it is well-matched for binary classification, 32 SVM has been shown to work well with high dimensional data. 33 All of these methods have excellent ability to model non-linear relationship of the data observed in real-life situations. They are also easy to implement in clinical settings. The Python software version 3.8.3 scikit-learn package was used for machine learning modeling in this study.
To account for overfitting, the data were randomly split into a 60% training set and a 40% test set. Models were then applied to both the training and test sets, and their accuracies were recorded. We aimed to keep the difference of the accuracies between the training and the test sets to be no greater than 7%, to avoid overfitting of the data. When the data were overfitted, adjustments were made to the model parameters. Specifically, we applied L2 regularization to the model to overcome overfitting issues commonly occurred with k-NN methods. Per United States federal regulations (45 CFR 46, category 4), this is a secondary data analysis and the study does not require ethnic review since the data were deidentified and publicly available.
Results
For AF patients undergoing catheter ablation, there was a total of 9,468 (weighted N = 20,612) cases for the 90-day readmissions. After applying exclusion criteria and accounting for index admissions and death, there were 4,922 cases (weighted N = 10,547) remaining. The 90-day hospital readmission rate was 17.6%. The average age of the patients was 64.9 years old and there were 62.9% males (Table 1).
Demographic Characteristics of 90-Day Readmissions (Numbers Outside of the Parentheses Are Weighted; Numbers Inside the Parentheses Are Unweighted).
Figure 1 displays the relative variable importance score for the top 30 features. The higher the importance score, the more useful a feature in predicting the outcome. Patient’s age was the most important feature among the top 30 features selected by random forest for determination of the likelihood for being readmitted (Figure 1). The patient’s age, the total number of discharges from a hospital, the number of diagnoses a patient had at discharge, the number of chronic conditions a patient had at discharge, the number of procedures a patient had at discharge, length of initial hospital stay, and gender were the top predictor variables identified for the 90-day readmissions.

Relative variable importance of the top 30 features in predicting 90-day hospital readmissions in atrial fibrillation patients undergoing catheter ablation.
With a predictive accuracy of approximately 85%, k-NN performed the best among the machine learning methods. It was followed by decision tree at 72.3% (Figure 2). SVM showed an accuracy of 62.6%. K-NN also had the highest positive predictive value (i.e., precision) at 0.875 (Figure 2). Overall, accuracy, precision sensitivity, specificity, and AUC were similar between k-NN and decision tree (Figure 3). K-NN and decision tree performed better than SVM.

Performance metrics of machine learning models using the top 6 features (90-day readmissions).

Receiver operating characteristic curves for the various machine learning methods (90-day readmissions).
Discussion
The purpose of this study was to predict 90-day hospital readmissions status for AF patients undergoing catheter ablation. Results demonstrated that machine learning methods were able to predict the occurrence of hospital readmissions at approximately 85% accuracy. The top predictors were: age, total discharges from hospital, number of diagnoses a patient had upon discharge, the number of chronic conditions a patient had upon discharge, the number of procedures on patient’s record, length of hospital stay, and gender.
The cross-sectional nature of the NRD data is a limitation of the study. Using data from multiple years would allow development of potentially more accurate predictive models. Future studies may consider collecting longitudinal data to model prediction and confirm the results. Also, updates to ICD manuals and other healthcare references and tools have occurred after the data collection. While hospital characteristics are influential factors in predicting hospital readmissions for different conditions such as heart failure, 34 this does not seem to be as conclusive for atrial fibrillation readmission predictions. Furthermore, translating findings into institutional policies can be difficult for hospitals without the adequate budget. For larger hospitals that have many beds, have academic affiliation, adequate staffing, and greater proportion of Medicare and privately insured patients, readmissions prevention measures are normally more feasible. Outpatient management of AF may be relatively easy when compared with other cardiac conditions such as heart failure.
Another limitation is that large national databases such as NRD have a long latency period between the time of data collection and the time that data are available for the public to use and analyze. The results could have changed if there are more recent data. Nonetheless, there are still insights gained from the data that can help design prospective investigations in the future.
Not all studies can be benefited by machine learning techniques. In general, large national or international data, or big data collected from any organizations have the potential to reap the benefits from machine learning to discover new insights, but small data do not. Traditional statistical approaches are regarded as model-driven, meaning a model is predetermined by preformulated hypotheses or existing theories and then it is being tested to determine whether the null hypotheses can be accepted or rejected. However, machine learning approaches are data-driven, meaning there is not a predetermined model. Thus, hypothesis testing or theory testing is not applicable. In fact, the goal of the machine learning is to discover new insights, new knowledge and new theories. Machine learning is relevant for organizations that want to gain more insights from their data to innovate and not do business as usual, and is useful for handling large and complex data when the relationships between the variables are not apparent. 35
Previous study demonstrated that older age and various comorbidities of patients who underwent AF ablation are factors independently associated with increased likelihood of 90-day readmissions, 36 which matched with our findings. Specifically, patients having 5 or more comorbidities were 2 times more likely to be readmitted within 90 days of initial hospital discharge. The literature had also reported gender, length of initial hospital stay, disposition to facility 18 as well as number of chronic conditions 36 as the top predictors, which was consistent with our study findings. Furthermore, using machine learning our study was able to discover additional top predictors which were missed by prior studies that used traditional statistical approaches. These additional predictors include the total number of hospital discharges, the total number of diagnoses a patient had at discharge and the total number of procedures a patient had at discharge. The differences in analytical methods likely attribute to the discrepancies between our research and prior research. Prior research had utilized mainly traditional statistical methods for analysis. Using machine learning to conduct analyses can lead to an improved understanding of the data and an innovative opportunity for new frontiers of discovery.
Our models were able to reach a high predictive accuracy of 85% while using a supervised machine learning approach. Such models can be valuable for both policymakers and healthcare providers. Healthcare providers might find it helpful to look closely into a patient’s record and provide patients with more personalized medical treatments to improve healthcare quality and minimize hospital readmissions. Applying these predictive models to assess hospital readmission risks can contribute to effective preventative treatments, lowering of medical costs, improvement in patient care, and having fewer mortalities. 28
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors sincerely thank the Clinical Outcome Research and Education at Roseman University College of Dental Medicine for supporting this study.
