Abstract
Globally, heart disease (HD) persists as a major contributor to mortality rates, requiring accurate and efficient diagnostic models. While machine learning has shown promise in early detection, challenges such as missing data, class imbalance, suboptimal feature selection, and inefficient hyperparameter tuning hinder predictive accuracy and reliability. Many existing models fail to effectively preprocess medical datasets, leading to biased and computationally expensive predictions. To address these issues, this study proposes a strong hybrid framework for HD prediction. The Balanced Imputation-Normalization Framework incorporates K-Nearest Neighbors (KNN) imputation, StandardScaler normalization, and the Synthetic Minority Oversampling Technique (SMOTE). KNN imputation effectively handles missing data, ensuring reliable representation, while StandardScaler normalization standardizes feature values to enhance model stability. SMOTE is applied to address class imbalance, synthetic samples are generated to augment the minority class. Feature selection is optimized using the Hungarian algorithm, which systematically selects the most relevant attributes while reducing redundancy. Additionally, Bayesian optimization fine-tunes hyperparameters to improve classification performance. For prediction, an ensemble learning approach combines Random Forest (RF), Decision Tree (DT), K-Nearest Neighbors (KNN), Naïve Bayes (NB), and Extreme Gradient Boosting (XGBoost). The Voting Ensemble aggregates predictions using hard and soft voting mechanisms, improving robustness and generalization. Experimental results on benchmark heart disease datasets demonstrate that XGBoost attained a peak accuracy of 96.43%, with subsequent results from the Voting Ensemble at 95.66%, significantly outperforming traditional models and demonstrating that ensemble learning effectively improves accuracy and reduces computational complexity.
Keywords
Get full access to this article
View all access options for this article.
