Explainable artificial intelligence–driven ensemble learning for asthma risk prediction using machine and deep learning

Abstract

Objective

Asthma, a chronic respiratory condition characterized by airway inflammation and constriction, affects millions of individuals worldwide, resulting in high healthcare expenses and a lower quality of life. Early prediction and control of asthma risk are critical for avoiding exacerbations and improving outcomes.

Methods

In this study, we describe a comprehensive asthma prediction model that uses machine learning and deep learning techniques to estimate asthma risk based on a variety of health and environmental parameters. Recursive feature elimination and Extra Trees Classifier were used to choose features, and the synthetic minority over-sampling approach was used to balance the dataset to overcome class imbalance. Hyperparameter tuning was used to optimize performance for 12 machine learning models such as extreme gradient boosting, Random Forest, and support vector machine as well as deep learning models, including multilayer perceptrons, convolutional neural networks, recurrent neural network, and artificial neural network.

Results

After hyperparameter adjustment, ensemble approaches that used both hard and soft voting were evaluated. When hyperparameter adjustment was used, the soft voting ensemble that combined XGBoost and CatBoost achieved the highest accuracy (93.61%). Shapley additive explanations and local interpretable model-agnostic explanations were employed to make predictions interpretable, providing information on feature contributions and boosting clinician confidence. A Flask server and web interface were also deployed, enabling real-time user interaction where patients and medical professionals could enter data and obtain asthma risk estimations immediately.

Conclusions

This study presents an accurate and explainable asthma risk prediction framework using ensemble machine and deep learning models, achieving 93.61% accuracy with real-time clinical applicability.

Keywords

Asthma machine learning recursive feature elimination Extra Trees Classifier synthetic minority over-sampling approach Shapley additive explanations local interpretable model-agnostic explanations web application

Introduction

Asthma facts

Asthma is not only a common health concern but also a highly complicated disorder that has received considerable interest in the field of medical research due to its multifactorial nature and extensive societal impact.¹ It affects more than 260 million people worldwide and is a major source of chronic illness, especially in children, although it can affect people of all ages. Asthma is believed to affect 1 in 13 individuals globally; it significantly raises healthcare expenses and is a major cause of missed work and school days. Approximately 500,000 people die from the illness annually, the majority of whom live in low- and middle-income nations with little access to high-quality medical care.

The characteristics of asthma include persistent inflammation and constriction of the airways, which cause sporadic episodes of coughing, chest tightness, wheezing, and shortness of breath.² These symptoms can be moderate to severe and can differ from person to person. Allergens, including pollen, dust mites, and pet dander; environmental pollutants such as cigarette smoke and industrial emissions; respiratory infections; and even physical activity or stress can cause asthma episodes, also known as exacerbations. Even those with managed asthma may have unexpected, severe symptoms that necessitate emergency medical attention due to the unpredictable nature of asthma exacerbations. Patients with asthma experience significant emotional and psychological effects in addition to the medical impact. Anxiety or sadness can be exacerbated by persistent symptoms and the fear of unexpected flare-ups, which can also limit engagement in everyday activities and lower physical fitness. Asthma is one of the main reasons for kids to miss school, an event which has an impact on their social and intellectual development.³ Similar disturbances in work life are common among adults with asthma, which lowers productivity and increases healthcare service consumption. It has an enormous financial impact with direct expenses for medical care, hospital stays, and prescription drugs as well as indirect expenditures due to missed work, absenteeism, and early mortality.

Even with advanced treatment options such as bronchodilators and inhaled corticosteroids (ICSs), which help manage symptoms and avert episodes, a sizable percentage of asthma patients continue to receive insufficient care or have their condition poorly managed. Lack of access to healthcare, poor drug adherence, underdiagnosis, or improper handling of environmental triggers are some causes of poor asthma management.⁴ Furthermore, the underlying causes, symptoms, and treatment responses of asthma can differ greatly from person to person because it is an illness with multiple phenotypes. In order to create more individualized treatment plans, research studies have recently concentrated on obtaining a better understanding these characteristics,⁵ especially through the use of biomarkers and genetic investigations. Modern asthma care seeks to determine the underlying biological mechanisms causing the disease in each patient in addition to the conventional emphasis on symptom management. The use of data-driven methods to enhance early diagnosis, optimize treatment plans, and anticipate exacerbations is becoming more and more important as machine learning (ML) and predictive modeling become more widely available. These methods can help lower the disease burden and enhance patient outcomes. Thus, research on asthma remains crucial as scientists work to understand its intricacies and reduce its profound effects.

Problem statement

In clinical practice, accurate identification and anticipation of asthma exacerbations continues to be a substantial issue despite tremendous advancements in our understanding and treatment of the condition. Since asthma is a very diverse illness, it can be challenging to diagnose, particularly in its early stages or when symptoms are intermittent because its symptoms might mimic those of other respiratory disorders. A delayed or incorrect diagnosis can result in ineffective therapy, extended suffering, and a higher chance of potentially fatal severe asthma attacks. Additionally, it can be challenging to identify high-risk patients before a crisis arises because asthma exacerbations are frequently unanticipated and can be caused by a range of environmental, genetic, and lifestyle variables. Current diagnosis techniques mostly rely on subjective judgments by doctors, pulmonary function tests, and patient-reported symptoms, all of which might be impacted by several external factors. These approaches do not always fully convey the disease’s complexity, especially when it comes to mild or sporadic asthma. More precise, data-driven methods are therefore desperately needed to enable early diagnosis and enhance exacerbation prediction. Furthermore, in the current therapeutic environment, individualized asthma management that is based on each patient’s unique triggers and molecular markers remains an unfulfilled objective.

By combining clinical data, environmental factors, and patient health indicators, this study attempted to solve these issues by creating and assessing ML models that can more accurately diagnose asthma and predict exacerbations. The study objective was to develop a tool to help physicians make better decisions to ultimately improve patient outcomes and enable more effective asthma care.

Research motivation

This study was motivated by the substantial worldwide burden of asthma, its complexity, and the difficulties physicians encounter in correctly diagnosing and predicting the illness course. Asthma remains the most common causes of hospitalization, emergency room (ER) visits, and lost productivity globally, even with the availability of efficient therapies. The unpredictable nature of asthma episodes and the challenges involved in early diagnosis, especially in cases that are mild or asymptomatic, point to a serious weakness in the current healthcare management of this condition. This disparity frequently leads to inadequate care, delayed intervention, and misdiagnosis, all of which can result in serious health issues, lower quality of life, and even death.

The diagnosis and treatment of asthma could be greatly enhanced by the development of ML and data-driven methods in the medical field. Finding trends and prognostic signs that conventional diagnostic methods might overlook might be facilitated by utilizing clinical, environmental and lifestyle data. We can greatly improve patient care, cut down on needless hospital stays, and customize treatment regimens to meet the needs of each patient by creating models that help physicians diagnose patients more accurately and anticipate exacerbations. This research was strongly motivated by the potential to improve clinical outcomes and lessen the socioeconomic impact of asthma. The possible integration of ML models into standard clinical processes is another goal of this research, which could revolutionize asthma care by adding a new level of decision support.

Study aims and contributions

The goal of this research was to use ML and deep learning techniques to create a reliable and understandable predictive model for asthma risk and management. The objective was to develop a model that can support individualized treatment, predict asthma outcomes with accuracy, and provide doctors an easily navigable tool for making decisions in real time. A list of the most significant contributions made by this study is provided below.

1. Utilization of the synthetic minority over-sampling approach (SMOTE) to rectify class imbalance and ensure equitable learning and accurate predictions in all classes.

2. Use of the recursive feature elimination (RFE) and Extra Trees Classifier (ETC) for feature selection to find the most influential features enhancing the interpretability and efficiency of the model.

3. Use of 12 ML algorithms and four crucial deep learning models and optimization of their performance through hyperparameter tuning (HPT) and ensemble approaches to increase model accuracy through the use of hard and soft voting procedures followed by HPT of the ensemble models to refine them.

4. Use of local interpretable model-agnostic explanations (LIME) and Shapley additive explanations (SHAP) to enhance model interpretability, making predictions clear and intelligible for clinical application and the creation of a user-friendly web interface that enables real-time prediction, enabling medical practitioners to enter patient information and obtain asthma risk assessment results promptly.

Prediction reliability was increased and important shortcomings of current approaches were successfully addressed using the suggested framework. The remainder of this paper is structured as follows. A review of previous studies on asthma prediction is presented in Section 2. The suggested methodology, which includes feature selection, data preparation, and model construction, is explained in Section 3. The experimental outcomes, performance assessment, results with explainability analysis, and web deployment are presented in Section 4.

Literature review

Jeddi et al.⁶ used data from 202 pediatric asthma patients to create ML models for predicting childhood asthma in Morocco. Using chi-square test, they evaluated 36 risk factors and found 19 significant factors, including early-life behaviors such as breastfeeding and birth mode, environmental exposures (mites, mold, and cold air), and family history of atopic illnesses, with Random Forest (RF) achieving a maximum accuracy of 84.9%. The study emphasized the need to eliminate avoidable maternal and prenatal risk factors to lower the incidence of asthma. Murad et al.⁷ employed ML to predict asthma symptoms and deliver treatment through an Android application. Eight algorithms were used to examine data from 4500 individuals spanning 23 asthma-related characteristics. The Decision Tree (DT) classifier achieved an accuracy of 87%. TensorFlow’s integration of ML into the app provides tailored asthma treatment. The Java-based Android Studio software offers a user-friendly way to manage asthma. In the study by Li et al.,⁸ asthma prediction was performed using a clinical dataset of 152 samples with 24 standard blood indicators. An affinity network documented the intrinsic links between samples, and a projection matrix was used to minimize feature dimensionality to improve the classification accuracy. The affinity network was used to improve prediction in the development of a novel classifier called AGEC. Five additional models were compared with AGEC, which performed better, with an accuracy of 72.50%. The findings show that, compared with previous models, incorporating correlations between variables greatly increases the accuracy of asthma predictions. Joo et al.⁹ used a traditional operational definition and conducted a study involving asthma patients from Seoul St. Mary’s hospital and St. Paul’s hospital between January 2017 and January 2018. A random sample of 353 asthma patients was selected for analysis out of the 4235 who were identified. Then, ML methods were used to increase the precision of asthma diagnosis. The extreme gradient boosting (XGBoost) model performed well with an accuracy of 87.1%, area under the curve (AUC) of 93.0%, sensitivity of 82.5%, and specificity of 97.9%. The use of ICS/long-acting beta-agonist (LABA), long-acting muscarinic antagonist (LAMA), and leukotriene receptor antagonist (LTRA) was identified as an important explanatory variable for accurate diagnosis, showing their importance in asthma management and prediction. Xie et al.¹⁰ used data from 9716 participants of the 2021–2022 National Health Interview Survey (NHIS) and created ML models to predict asthma in youth. Undersampling the majority class produced the best results, with logistic regression (LR) achieving an AUC of 0.7654 and an F1 score of 0.3452. Early screening and identification of childhood asthma was facilitated by significant risk factors such as a low family poverty ratio and parental history of asthma. Vatsal et al.¹¹ used an algorithm to predict exacerbations with 90% accuracy using data from 29,396 patients. The study demonstrates how ML-based models might enhance asthma management by providing tailored care and lower the burden that asthma imposes on patients and healthcare systems. The study also discusses the challenges and potential solutions for applying these models in clinical settings. Bose et al.¹² used ML models to predict which children who were diagnosed with asthma before 5 years of age would continue to make asthma-related medical visits between 5 and 10 years of age. The best-performing model, XGBoost, achieved a mean average normalized sensitivity accuracy (ANSA) of 0.43 using data from 9934 children. The age at the most recent diagnosis, number of asthma visits, Black ethnicity, eczema, and allergic rhinitis were important predictors. The results demonstrated that ML can accurately forecast persistent asthma, offering parents and professionals helpful information. Kothalawala et al.¹³ predicted school-age asthma using clinical and environmental data from the Isle of Wight birth cohort (n = 1368), with a prevalence of approximately 15%. The best predictive features for every model were found using RFE. Fivefold cross-validation, imputation, resampling, and seven ML methods were used. The best results were obtained using support vector machines (SVMs) for the Childhood Asthma Prediction in Early Life (CAPE) (AUC = 0.71) and Childhood Asthma Prediction at Preschool Age (CAPP) (AUC = 0.82) models. Excellent sensitivity for detecting chronic wheezers and good generalizability were demonstrated via external validation in the Manchester Asthma and Allergy Study (MAAS) where the models performed well at ages 8 and 11 years.

Finkelstein et al.¹⁴ predicted adult asthma exacerbations using telemonitoring data from 7001 self-reports. Based on a 7-day window, the ML models, SVMs, adaptive Bayesian networks and naïve Bayesian classifiers, predicted exacerbations with accuracies of 0.77, 1.00, and 0.80, respectively. The findings demonstrate how ML may be used to provide individualized decision support in telemonitoring with potential uses in the treatment of chronic illnesses such as asthma. R et al.¹⁵ used a dataset that included information on lung function, symptoms, and clinical history, wherein a Naïve Bayes classifier produced good results. By picking the best samples, one-class SVMs (OCSVMs) also enhanced data representation. The model achieved 84.7% accuracy. Lugogo et al.¹⁶ measured peak inspiratory flow (PIF), inhalation volume, inhalation duration, and time to PIF over a 12-week inhaler-use period in adult patients with poorly managed asthma. The model predicted exacerbations within the next 5 days using gradient-boosting trees with a receiver operating characteristic (ROC) AUC of 0.83. The greatest predictor was increased inhaler use, specifically the mean number of inhalations in the 4 days before the prediction, indicating the promise of sensor-based models for early asthma treatments. Hussain et al.¹⁷ tested models using a large, longitudinal dataset from the Optimum Patient Care Research Database, comprising de-identified patients aged 8–80 years with asthma diagnosed by clinicians and 3 years of continuous data (2016–2018). Risk factors from the first year and the frequency of asthma attacks over the next 2 years will be used in feature selection and classification (one- and two-class classifiers). De Hond et al.¹⁸ assessed how well home-monitoring data from patients with stable mild-to-moderate asthma could predict severe asthma exacerbations using LR and ML models (XGBoost and one-class SVM). The models were compared with a clinical rule based on daily peak expiratory flow and symptom data from 101 patients (validation cohort) and 165 patients (development cohort). Only 0.2% of all daily measures included severe exacerbations. The study emphasizes the difficulty involved in the use of these models in clinical situations where there are many false alarms and few exacerbations. Luo et al.¹⁹ used ML models created by combining patient characteristics and environmental factors using data from the Asthma Symptom Tracker, which includes information regarding 2912 weekly asthma assessments from 210 children performed over a 2-year period. The accuracy of the best model was 71.8%.

Gunawardana et al.²⁰ reported their results obtained using a hybrid model that combined light gradient boosting machine (LightGBM) and LR, with a sensitivity of 79.85% and an AUC of 0.9062. The suggested tool helps patients and professionals by offering a simple and efficient way to identify potential asthma cases. Tomita et al.²¹ have suggested a strategy that uses RF and an optimized XGBoost classifier to enhance asthma diagnosis in adults by utilizing objective data from medical records. Bayesian optimization was used to tune the RF and XGBoost hyperparameters, and the models were evaluated in terms of accuracy and AUC. XGBoost surpassed RF with an accuracy of 81% and an AUC of 85%, demonstrating its potential for reliable asthma diagnosis.

AlSaad et al.²² used recurrent neural networks (RNNs) and created deep learning models to forecast emergency department (ED) visits in children with asthma. The RNN models outperformed the baseline model, with an F1 score of 0.61, AUC–ROC of 0.85, and AUC–PR of 0.74. The study has shown that children at high risk of frequent ED usage can be successfully identified using RNN models based on the electronic health records (EHRs) data, enabling more individualized asthma treatment. He et al.²³ analyzed 132 characteristics of 1754 multiethnic children to predict asthma occurrence by the age of 5 years, using data from the Canadian Healthy Infant Longitudinal Development (CHILD) cohort study to find early indicators of asthma in children. Early-life data (≤1 year of age) offered limited predictive potential (area under the precision–recall curve (AUPRC) <0.35) according to ML models; however, data from children aged 3 years showed a considerable improvement in accuracy (area under the ROC curve (AUROC) >0.90, AUPRC >0.80). The most significant indicators of early childhood asthma were wheezing and atopy, while maternal asthma, antibiotic exposure, and lower respiratory tract infections were consistently strong predictors. More focused interventions beginning at the age of 3 years are supported by this temporal understanding.

Gaudillo et al.²⁴ investigated the use of single nucleotide polymorphisms (SNPs) in conjunction with ML models to evaluate an individual’s susceptibility to asthma. For feature selection, the researchers used RF and RFE, finding SNPs strongly linked to asthma. This study has demonstrated how ML models can be used to supplement conventional techniques for predicting genetic susceptibility to complicated illnesses such as asthma. Table 1 summarizes relevant previous studies.

Table 1.

Related previous studies on asthma prediction.

Reference	Dataset	Feature selection	Method	Number of ML algorithms used	Selected features
Zineb Jeddi et al.⁶	202 patients from IBN Sina Hospital	Chi-square	Random Forest	4	19 out of 36
Saydul Akbar Murad et al.⁷	Data collected from media (n = 4500)	×	Decision Tree	8	23
Joo et al.⁹	Data of 4235 patients collected from two hospitals	Feature extract	XGBoost	5	×
Xie et al.¹⁰	Data of 9716 patients collected from the NHIS	LASSO + RF	XGBoost	5	17 out of 34
Vatsal et al.¹¹	Data of 29,396 patients collected from Kaggle	PCA	Ensemble	5	×
Bose et al.¹²	Data from 9934 patients collected from the EHRs	Filter + embedded	XGBoost	6	5%
Tomita et al.²¹	Data of 566 patients collected from KUH	×	XGBoost	2	8
Gaudillo et al.²⁴	Data collected from open SNP database	RFE + RF	RF + SVM	2	98%

× indicates that no same data are used in the previous work.

RFE: recursive feature elimination; NHIS: National Health Interview Survey; SNP: single nucleotide polymorphism; RF: Random Forest; XGBoost: extreme gradient boosting; EHRs: electronic health records; SVM: support vector machine; PCA: principal component analysis; KUH: Kyoto University Hospital.

Jeddi et al.⁶ have reported that, without sophisticated computational assistance, the model’s complexity may restrict its clinical usefulness, and its dependence on a small sample of predictors may leave out other pertinent risk factors, which could affect the model’s generalizability. The model’s robustness and generalizability to bigger more diverse populations are limited by the dataset’s small size (n = 202 children), which also makes it vulnerable to demographic and regional biases. Furthermore, although a variety of ML models were used, deep learning methods were not used, which can increase accuracy when used for bigger datasets. Murad et al.⁷ have stated that although the Android app method is user-friendly, it might not be as reliable as required for clinical precision, especially in areas with limited resources or diversity. Furthermore, adding new physiological data sources and bigger more varied datasets could improve the model’s prediction power. Li et al.⁸ used a dataset comprising only 152 samples, which may restrict the generalizability of the approach. Although practicable, the use of common blood biomarkers may not account for all pertinent factors impacting asthma, which could compromise the model’s comprehensiveness. The use of larger, more varied datasets and incorporation of more clinical and environmental variables can help future studies enhance the predictive models’ application and robustness. Joo et al.⁹ used clinical criteria for dataset inclusion, which may have introduced selection bias and reduced the model’s applicability to larger populations with varying demographics or degrees of healthcare access. Furthermore, some ML models, often referred to as “black-box” models, are not interpretable, implying that physicians may find it difficult to comprehend or believe the predictions, especially if the model’s decision-making procedures are unclear. Xie et al.¹⁰ reported that the model’s comprehensiveness is limited by the NHIS’s limited feature selection, especially the lack of comprehensive clinical factors; furthermore, the use of self-reported data may introduce errors. Performance indicators indicate a moderate capacity for prediction (F1 score of 0.3452), and the absence of external validation reduces trust in the model’s cross-population resilience. To improve the model’s generalizability and dependability in clinical or public health applications, more testing and improvements are required. Vatsal et al.¹¹ showed that ensemble learning can be used to predict asthma; however, they also emphasize that in order to improve reliability and make the model scalable for use in clinical settings, thorough optimization and validation across a variety of datasets are required. Bose et al.¹² have reported that without longitudinal updates, the model’s predictive ability to track the course of asthma over time is weakened by its reliance on self-reported and clinically documented data. The intricacy of the ML algorithms employed presents substantial interpretability issues, making it challenging for doctors to comprehend and reliably use the predictions in practice. The model’s potential for widespread clinical use is limited in the absence of external validation, which raises questions about its dependability in various healthcare environments. Kothalawala et al.¹³ did not adequately address the problem of overfitting, which can lead to poor performance when applied to new data but high accuracy within the training dataset. The model’s dependability in various clinical contexts is further compromised by the absence of external validation. Finkelstein et al.¹⁴ used a 7-day monitoring window in their study, which might have led to missed recording of longer-term changes that are essential for precise exacerbation prediction. Deep learning methods or more intricate ML models that could enhance predictive performance were not compared in this study. R et al.¹⁵ have reported that despite using the OCSVM method to choose the best data representation, the lack of a more thorough sampling strategy could still cause bias. This is especially true if the dataset is diverse or does not contain sufficiently representative samples across patient subgroups. It would be difficult to use this model in clinical settings without interpretability mechanisms since it might not offer the transparency required for well-informed clinical decision-making. Lugogo et al.¹⁶ included 360 adult patients with asthma; however, given the variety of asthma manifestations, this sample size may not have captured the heterogeneity required for wider general applicability. Although effective, the model’s use of gradient-boosting trees restricts interpretability, which may make clinicians less confident in implementing such predictions for tailored treatment modifications. Hussain et al.¹⁷ stated that clinical implementation is hampered by the lack of interpretability of ML models, which makes it difficult for medical professionals to believe or act upon the model’s predictions in the absence of explicit explanations. Future research should consider resolving data completeness, increasing model clarity, and incorporating external validation to increase clinical utility. Gunawardana et al.²⁰ discovered crucial predictive traits but did not look at other potentially relevant elements such as genetic predispositions, environmental exposures, or thorough medical histories, which could improve their predicted performance. The study did not explore as to how the predictive tool can be incorporated into current healthcare systems, including user training and data protection, or the tool’s effect on clinical processes. AlSaad et al.²² used deep learning models that are frequently referred to as “black boxes” because of their lack of transparency, despite their excellent accuracy. Because they must comprehend the logic behind predictions in order to trust and act upon the model’s outcomes in patient care, doctors are unable to adopt these models in clinical practice owing to this lack of interpretability. The retrospective aspect of the study may have introduced biases due to the use of outdated data recording techniques, and the model may lack real-time prediction skills, limiting its practical application. Additionally, there is a risk of overfitting, which would impair the model’s performance on new data, and without external validation, its robustness in a variety of clinical contexts remains unknown. Gaudillo et al.²⁴ have proposed that moderate accuracy (62.5%) indicates constrained predictive capacity of the SNP-based model, precluding it from yielding accurate results for screening or clinical diagnosis. Additionally, the study used feature selection methods (RF and RFE) to identify important SNPs; however, it omits lifestyle and environmental factors that are known to contribute to asthma, which may limit the model’s overall accuracy and comprehensiveness. Despite their usefulness, ML algorithms such as SVM and k-nearest neighbors (KNN) present interpretability issues that make it difficult for clinicians to comprehend how certain SNPs affect asthma risk.

Several significant limitations persist in the existing research on asthma prediction despite the fact that several ML and deep learning techniques have been developed. First, single classifier models, which may have poor generalization and model instability across datasets, are used in several studies. Second, feature selection is frequently performed with a single technique, which may result in the overlooking of complementary merits of hybrid selection strategies. Third, models may be unable to reach their full predictive potential if systematic hyperparameter optimization is not regularly implemented. Furthermore, several previous studies have prioritized prediction accuracy over model interpretability, which restricts clinical trust and practical adoption. Additionally, only a few studies offer user-friendly or real-time deployment frameworks that can help clinical decision-making in the actual world.

The current study recommends a thorough and comprehensible ensemble learning approach for asthma risk prediction to solve the following issues. (a) integrating a two-stage feature selection pipeline that combines RFE and Extra Trees importance; (b) fine-tuning of numerous hyperparameters across various mL models; (c) creation of optimized ensemble models for hard and soft voting to increase robustness; (d) incorporation of explainable artificial intelligence (AI) techniques (SHAP and LIME) to improve model transparency; and (e) implementation of a web-based deployment interface for practical usability are the main contributions of this work. When combined, these contributions offer an asthma prediction system that is more reliable, comprehensible, and therapeutically useful than current methods.

Methodology

Proposed approach

In this study, the proposed method used a methodical ML and deep learning strategy to create a predictive model for asthma. The dataset that included essential variables for predicting asthma was obtained from Kaggle.²⁵ In the proposed work, we used the SMOTE to improve model performance for minority outcomes and ensure a balanced dataset by creating synthetic samples for the minority class. Before data balance, we reduced data dimensionality while preserving important predictive variables by selecting the most influential features using RFE and ETC. Subsequently, the dataset was divided into testing (20%) and training (80%) sets. In total, 12 ML algorithms were implemented, including LR, RF, AdaBoost (AB), gradient boosting (GB), DT, KNN, SVM (support vector classifier (SVC)), XGBoost, Bernoulli Naïve Bayes (BNB), ETC, CatBoost (CB), and Naïve Bayes (NB). Every model’s hyperparameters were adjusted to maximize performance. Convolutional neural networks (CNNs), RNN, artificial neural networks (ANNs) and multilayer perceptrons (MLPs) are deep learning models that were also trained with HPT to improve predictive accuracy. In the work, we used ensemble approaches with both soft and hard voting procedures to further enhance performance. The results were then refined by tuning the hyperparameters of the ensemble models. The predictive power of each model was thoroughly assessed using performance indicators such as accuracy, precision, recall, F1 score, and AUC value. Finally, we used LIME and SHAP to ensure model interpretability. This allowed us to pinpoint and elucidate how each parameter affected the model’s predictions, an aspect essential for clinical use. Finally, we created a web interface that offers healthcare practitioners an easy-to-use platform for entering patient data and obtaining real-time predictions. This interface supports individualized asthma management by improving the model’s usability and accessibility in clinical settings. Figure 1 shows the workflow of our proposed work.

Figure 1.

Workflow of our proposed work.

Dataset description

The dataset includes 2392 patient records, each of which is described by 15 attributes that offer comprehensive information about factors related to asthma. Patient ID, age, and sex are among the demographic data that are captured along with environmental variables such as dust exposure and lifestyle characteristics such as smoking behavior. Health indicators include common asthma-related illnesses such as pet allergy and hay fever; other symptoms such as wheezing, shortness of breath, chest tightness, and coughing; nighttime symptoms; and familial predispositions such as a family history asthma. Additionally, the dataset contains a float value measure of sleep quality because sleep is frequently disturbed by respiratory conditions. This dataset can be used for classification tasks as the binary diagnosis column, which indicates whether a patient has been diagnosed with asthma, acts as the target variable. The majority of the data are numerical, with clinical diagnosis, environmental exposure, and patient symptoms captured as both binary and continuous features.

Preprocessing

There were no missing values in the dataset, ensuring that every data point was complete and prepared for analysis without the need for imputation. Every feature had a numerical representation with float types for continuous variables (such as dust exposure and sleep quality) and integer types for discrete variables (such as sex, smoking, pet allergies, and diagnosis). The dataset was well suited for predictive modeling tasks because there were no categorical variables; thus, the dataset required less preprocessing and enabled simple deployment of ML models without the need for lengthy data transformation steps.

Balance data

Figure 2 depicts the impact of using the SMOTE to resolve class imbalance in the dataset for asthma prediction, with 2268 cases in the majority class (diagnosis = 0) and only 124 in the minority class (diagnosis = 1). There was a notable imbalance prior to the use of the SMOTE. This disparity raised the possibility of bias in the model since it might have given preference to the majority class and lowered the precision of asthma case prediction. An equal distribution of 2268 instances for each class was achieved by synthetically increasing the minority class to match the majority class after the SMOTE was used. Finally, this balanced dataset demonstrated improved performance for recognizing and predicting asthma cases by enabling the algorithm to learn equally from both classes, providing a more impartial and accurate predictive model.

Figure 2.

Dataset balancing performed using SMOTE. SMOTE: synthetic minority over-sampling approach.

Feature selection

Feature selection is the process of identifying and retaining only the most relevant variables for a predictive model; this process improves model accuracy, efficiency, and interpretability by removing irrelevant or redundant data. In this study, we used an ETC and RFE to focus on factors that were most strongly related to asthma risk. The proposed model chose age, sleep quality, dust exposure, hay fever, chest tightness, coughing, and nighttime symptoms as the most important features. These carefully chosen variables highlight critical environmental and health related elements, increasing the model’s efficiency and therapeutic significance for asthma prediction.

Table 2 presents the most pertinent variables for asthma prediction selected using two feature selection methods, RFE and ETC. Each component was assigned an influence score based on their importance, using the ET method; smoking (0.113), age (0.179), and sex (0.140) were determined as the main contributors. This suggests that lifestyle and demographic factors exerted a significant impact on asthma risk. Conversely, RFE iteratively eliminated less significant features, identifying age, sleep quality, dust exposure, hay fever, chest tightness, coughing and nighttime symptoms as the most important variables. This combined strategy concentrated on features that exerted a major impact on asthma prediction, ensuring the model’s effectiveness and interpretability. Smoking and sex were not selected by RFE despite their significance in the ET model, indicating that they may be redundant despite of bringing value. The model’s excellent accuracy and insights into important asthma risk factors are possible owing to the selection of pertinent variables using ETC significance and RFE, which makes the model both predictive and clinically interpretable.

Table 2.

Feature selection by combining ETC and RFE.

Feature	ET importance	Selected by RFE
Age	0.179498	Yes
Sex	0.140427	No
Smoking	0.112821	No
Sleep quality	0.082997	Yes
Dust exposure	0.069688	Yes
Pet allergy	0.064869	No
Family history of asthma	0.060183	No
Hay fever	0.056189	Yes
Wheezing	0.052096	No
Shortness of breath	0.049105	No
Chest tightness	0.046618	Yes
Coughing	0.043863	Yes
Nighttime symptoms	0.041648	Yes

RFE: recursive feature elimination; ET: Extra Trees; ETC: Extra Trees Classifier.

In Figure 3, the bar chart depicts the top seven features for asthma prediction, selected using RFE and ETC. RFE chose age as the most influential feature, whereas sex and smoking were also considered important but were not chosen, most likely due to redundancy. RFE chose key features such as sleep quality, dust exposure, and hay fever because of their unique contributions. Other indicators, such as chest tightness, coughing, and nighttime symptoms were chosen for their predictive value. Overall, RFE improved the model by emphasizing the most important, non-redundant features for accurate asthma prediction.

Figure 3.

Feature selection using a combination of ETC and RFE. ETC: Extra Trees Classifier; RFE: recursive feature elimination.

Justification of the features selection pipeline

In the system, we examined baseline classifiers trained on the entire feature set, including baseline linear and tree-based classifiers, without explicit feature elimination, to determine the need for the suggested RFE + ETC feature selection method. When assessed using ensemble and boosting techniques, these baseline models showed significantly inferior stability and discriminative power. Initially, ET was used to reduce model variance through randomized splits and capture nonlinear interactions in order to assess global feature relevance. To ensure that only the most relevant and non-collinear variables were retained, redundant or weakly contributing features were progressively eliminated using RFE. When compared with single-step or unfiltered baselines, our two-stage strategy consistently improved interpretability, reduced the danger of overfitting, and improved generalization across several classifiers. Improved accuracy, AUC, and confusion matrix results showed that the final feature subset that was chosen performed better in ensemble models. These results support the RFE + ETC pipeline as a sound and efficient feature selection method for predicting asthma risk.

Data splitting

In our study, proposed works adopted an 80/20 data split technique; thus, 20% of the dataset was set aside for testing, and the remaining 80% was used for model training. This method ensures that a significant amount of the data would be used to train the model, allowing it to identify trends and connections between the target variable and characteristics. Further assessment of the model’s performance was performed using the test set, which constituted 20% of the data and was hidden throughout training. This data split was applied to reduce overfitting and ensure that the model performs well on fresh, untested data, allowing a more realistic evaluation of its resilience and predictive ability.

ML

RF

RF is a potent ensemble learning technique for classification and regression problems.²⁶ To enhance predictive performance and minimize overfitting, it builds several DTs during training and mixes their outputs. RF determines the best feature splits in classification tasks using impurity metrics such as the Gini index. The following formula is used to determine the impurity reduction at node j:

P_{i j} = L_{j} N_{j} - L_{left (j)} N_{left (j)} - L_{right (j)} N_{right (j)}

(1)

where:

L_{j}

symbolizes the impurity at node j.

N_{j}

represents the quantity of samples at node j, while the terms on the left and right represent the impurity and sample size of the child nodes, respectively. The goal is to choosing splits that optimize impurity reduction to enhance classification performance.

XGBoost

This is a potent and extensively utilized ML technique that is based on DT ensembles and is especially made to maximize speed and performance.²¹ By using GB to rectify the faults of previous models, it develops models one after the other with each new model aiming to reduce the loss from the preceding one. Owing of its features that help prevent overfitting, including regularization and parallel processing, XGBoost is well-known for its ability to handle huge datasets efficiently.

CB

This GB approach works well with categorical data and does not require extensive preprocessing such as label or one-hot encoding. The primary innovation of CB is its native handling of categorical features, which lowers the chances of overfitting and improves model performance.²⁷ It uses a technique called ordered boosting, which is intended to stop prediction leaking and facilitates quick training using graphics processing unit (GPU) acceleration. Furthermore, CB is a dependable option for both regression and classification problems due to its stability and ease of adjusting. In feature engineering, it frequently yields excellent accuracy with little effort. The objective of the learning problem is to create a function that reduces to the smallest possible extent the anticipated loss described in equation (2).

F (H) = φ L (y, H (X))

(2)

Ensemble method

This model is created by combining predictions from several models using ML techniques to produce a more potent and precise model. This method is based on the theory that by combining the best features of several models, the ensemble can most often beat any single model in terms of generalization and error reduction.¹¹ Various forms of ensemble approaches exist, including boosting, which creates models progressively to rectify the errors of the previous ones, and bagging, which decreases variance by averaging predictions across models trained on random subsets of data. The proposed approach utilizes the weighted soft voting mechanism, as depicted by the calculation provided below:

y = \arg \max_{i} * \sum_{j = 1}^{m} ω_{i} * p_{i j}

(3)

The probability predicted by each classifier is represented by the symbol p, whereas the weight assigned to the classifier is represented by the symbol $ω_{i}$ .

Deep learning

ANNs

Asthma prediction uses strong deep learning models to find intricate correlations between sensor data, environmental exposures, and medical history. By analyzing these inputs, ANNs are able to predict exacerbations, predict asthma risks, and enable individualized treatment plans, giving doctors important information on asthma managment.²⁸ ANNs are useful for managing the multifactorial nature of asthma because they are composed up of layers of interconnected neurons that allow the model to learn from vast datasets.

MLP

This neural network model analyzes intricate patterns in data such as lifestyle characteristics, environmental exposures, and medical history to predict asthma risks. MLPs provide individualized treatment and assist in identifying high-risk patients with asthma.²⁹ MLPs are capable of capturing nonlinear correlations in asthma-related data owing to their several layers of interconnected neurons. Although they perform well, their clinical application may be limited because they require large training datasets and are frequently difficult to interpret.

Performance analysis and experimental results

Environmental setup

Our research was conducted in an improved ML environment based on Python 3.10. We used scikit-learn 1.3 to develop and evaluate models and pandas 2.1 and NumPy 1.25 to manipulate and analyze data. We visualized the findings using matplotlib 3.8 and seaborn 0.12. The models were trained on systems that used CUDA 12.1 for GPU acceleration, which ensured efficient computation for larger datasets. The environment was handled using pip and virtual environments, which ensured version control and easy experimentation. Jupyter notebooks were used for both development and experimentation.

Performance parameters

In our study, the proposed work used several key parameters to assess the success of ML models. Accuracy was utilized to determine the percentage of correct predictions throughout the dataset. Our proposed work utilized precision, which calculates the proportion of correctly predicted positive instances, and recall, which evaluates the model’s capacity to capture all true positive instances to gain a better understanding of how the model handled positive events. The F1 score, a harmonic mean of precision and recall, was particularly crucial for managing class imbalances.³⁰ Additionally, we used the AUC to assess the model’s overall performance in terms of class differentiation ability, with higher values indicating better performance. Furthermore, to illustrate the areas in which the model’s predictions were correct or incorrectly classified, we employed a confusion matrix to present a thorough breakdown of true positives, true negatives, false positives, and false negatives. In order to accomplish the goals of this inquiry, we determined the performance parameter by using the following equations: (4, 5, 6, 7, 8).

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(4)

Precision = \frac{T P}{T P + F P}

(5)

Sensitivity = \frac{T P}{T P + F N}

(6)

F_{1} - Score = 2 * (\frac{Sensitivity * Precision}{Sensitivity + Precision})

(7)

Auc = \sum_{i = 1}^{n - 1} {(FPR}_{i + 1} - {FPR}_{i}) * \frac{{(TPR}_{i + 1} - {TPR}_{i})}{2}

(8)

Confusion matrix

Figure 4 displays the confusion matrices for two asthma prediction models, the SVM and ETC. With 411 true negatives, 413 true positives, and only a few misclassifications (62 false positives and 22 false negatives), the ETC model performed well. This suggests excellent sensitivity–specificity balance and great accuracy. In contrast, the SVM model performed poorly, producing 345 true positives and 272 true negatives, with a notably higher number of misclassifications (201 false positives and 90 false negatives). Thus, ETC is a more accurate model for predicting asthma risk.

Figure 4.

Confusion matrix for the selected features.

Figure 5 displays the confusion matrices for the BNB & GB and ETC & XGBoost soft voting ensemble models. The ETC & XGBoost model performed well, exhibiting high accuracy with 391 true negatives, 455 true positives, and few misclassifications (50 false positives and 12 false negatives). However, the BNB & GB model was less successful in predicting asthma risk, as evidenced by its poor performance—just 329 true negatives and 404 true positives as well as higher misclassifications (112 false positives and 63 false negatives). Based on this, it can be concluded that XGBoost and ETC constitute the better model.

Figure 5.

Confusion matrix for the soft voting combination models.

In Figure 6, two confusion matrices that demonstrated the effectiveness of two distinct classifier combinations utilizing hard voting have been displayed. Strong performance was demonstrated by the ETC & XGBoost combination model, with 440 true positives and 400 true negatives as well as only 41 false positives and 27 false negatives. This suggests that the combination of ETC and XGBoost is highly accurate and efficient in differentiating between the two groups. In contrast, the combination of GB & AB performed poorly, with fewer true positives (373) and true negatives (360). Furthermore, the GB & AB combination model misclassified more cases, resulting in worse overall accuracy, as evidenced by its higher number of false positives (81) and false negatives (94). Thus, the ETC & XGBoost combination is superior to the GB & AB combination.

Figure 6.

Confusion matrix for the hard voting combination models.

Figure 7 displays two confusion matrices for various classifier combinations (CB & AB and CB & XGBoost) following hyperparameter adjustment with soft voting. Strong performance was demonstrated by the CB & XGBoost combination model, which achieved excellent accuracy with low false positives (45) and false negatives (13) as well as high true positives (454) and true negatives (396). The performance of the CB & AB model was inferior to that of CB & XGBoost model, as evidenced by slightly lower number of true positives (451) and true negatives (385) as well as slightly higher number of false positives (56) and false negatives (16). Thus, the CB & XGBoost combination performed better than the CB & AB combination model.

Figure 7.

Confusion matrix for the soft voting combination model after hyperparameter tuning.

In Figure 8, confusion matrices for two deep learning models are shown, a CNN and an overall, the ANN model performed better, with more accurate classification and lower values for false positives (97) and false negatives (42) as well as higher values for true negatives (376) and true positives (393). In contrast, the CNN model yielded a smaller number of true negatives (315) and true positives (391) but a higher number of false positives (158) and a comparable number of false negatives (44). This suggests that the CNN model has a stronger association with misclassification, particularly in terms of identifying true negatives. Consequently, the ANN model performs better than the CNN model.

Figure 8.

Confusion matrices for deep learning models.

ROC curves

Figure 9 presents the ROC curves comparing the performance of several ML models using the ETC-RFE method with seven chosen features. Strong predictive power with curves around the top-left corner were indicated by high-performing models with AUC values of approximately 1, such as ETC (0.972), XGBoost (0.971), RF (0.969), and CB (0.963). Moderate performers (AUC = 0.8–0.9) such DTs (0.874) and GB (0.912) exhibited respectable but not exceptional efficacy. In comparison to the other models, the SVM model performed the worst, with an AUC of 0.760. This was because its curve was closer to the diagonal line, indicating poor class distinction.

Figure 9.

ROC curves for 12 machine learning models. ROC: receiver operating characteristic.

Figure 10 demonstrates the performance of the top ten voting model combinations using ROC curves and AUC values. The highest AUC (0.977) was achieved by the combination of ETC & XGBoost, closely followed by RF & XGBoost, with an AUC of 0.975 and ETC & CB, with an AUC of 0.974. These pairings, which had curves toward the upper-left corner, showed good predictive performance. Other combinations, such as BNB & GB (AUC = 0.888) and GB & AB (AUC = 0.905), demonstrated relatively poorer performance with curves closer to the diagonal, indicating reduced effectiveness in class distinction. Overall, the ETC & XGBoost combination demonstrated the best overall performance.

Figure 10.

ROC curves for the top 10 soft voting model combinations. ROC: receiver operating characteristic.

Figure 11 depicts the ROC curves of specific voting model combinations after HPT using a soft voting technique. The pairings with the best AUC were ETC & CB (0.977), ETC & GB (0.975), and ETC & XGBoost (0.976). Strong class distinction was indicated by curves near the top-left corner of these models, which showed good prediction accuracy. Other combinations, such as BNB & GB performed poorly but are still useful (AUC = 0.954). The performance of these model combinations was further enhanced by HPT, giving ETC-based ensembles a distinct advantage over other ensembles in this set.

Figure 11.

ROC curve of the soft voting combination model after hyperparameter tuning. ROC: receiver operating characteristic.

Figure 12 presents the ROC curves illustrating the performance of four deep learning models: MLP, ANN, CNN, and RNN. The best-performing models were MLP and ANN, both with the highest AUC values of 0.90. Their ROC curves were closer to the top-left corner, indicating excellent classification accuracy. RNN had the lowest AUC (0.85), indicating somewhat worse performance, followed by CNN (AUC = 0.86). The curves show that CNN and RNN have relatively good performance; however, their predictive power was lower than those of MLP and ANN models, which exhibited the best performance for differentiating between classes in this set.

Figure 12.

ROC curve of four deep learning models. ROC: receiver operating characteristic.

Performance result

Table 3 details the performance measures for the various models; RF and ET yielded the highest accuracy (90.64% and 90.75%, respectively) and F1 scores (both at 90.75%), with ETC demonstrating slightly higher AUC values (97.02%). LR, NB, and BNB performed moderately with accuracy and F1 score of approximately 75%. SVM had the poorest performance, with an accuracy of 67.95%. GB, CB, and XGBoost produced strong results, with accuracies ranging from 84%–89%, while KNN performed well (accuracy = 82.16% and AUC = 91.85%). Overall, ensemble methods such as RF, ET, and boosting algorithms produced the most favorable results.

Table 3.

Model performance for the selected features.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 score (%)	ROC value (%)	AUC value (%)
Logistic regression	75.66	76.2	75.66	75.63	75.88	81.37
Random Forest	90.64	90.94	90.64	90.64	90.79	96.31
Decision Tree	84.91	85.39	84.91	84.9	85.11	85.11
SVM	67.95	69.41	67.95	67.63	68.41	73.14
K-nearest neighbors	82.16	85.39	82.16	81.88	82.75	91.85
BNB	75.33	75.95	75.33	75.29	75.57	80.76
Extra trees	90.75	91.1	90.75	90.75	90.92	97.02
Naïve Bayes	75.0	76.41	75.0	74.83	75.4	82.18
Gradient boosting	84.14	84.53	84.14	84.14	84.32	90.35
AdaBoost	76.76	77.07	76.76	76.76	76.91	83.53
CatBoost	89.65	90.14	89.65	89.64	89.85	95.68
XGBoost	89.32	89.71	89.32	89.31	89.5	96.15

ROC: receiver operating characteristic; AUC: area under the curve; SVM: support vector machine; BNB: Bernoulli Naïve Bayes; XGBoost: extreme gradient boosting.

Figure 13 depicts 12 ML models utilizing the ETC-RFE hybrid feature selection method with seven features shown in the bar chart along with their accuracy scores. ET was the most successful model using this feature selection strategy with a maximum accuracy of 90.75%, followed by RF (90.64%), and CB (89.65%). DT ranked second, with 84.91% accuracy, followed by XGBoost with an accuracy of 89.32%. SVM performed poorly in comparison to other models with the lowest accuracy of 67.95%. Thus, simpler models such as SVM demonstrated worse performed in this configuration, while ensemble models such as ET, RF, and CB performed well, yielding good accuracy, with the chosen features.

Figure 13.

Accuracy scores for 12 machine learning models with selected features.

As presented in Table 4, for the combined model performance, the combination of ET & XGBoost and RF & XGBoost yielded the best accuracy (93.17% and 93.28%, respectively) and F1 scores (93.15% and 93.26%, respectively), with ET & XGBoost yielding a slightly higher AUC value (97.68%). The CB & XGBoost and ET & CB combinations also performed well with accuracy of 92% each and AUC values >96%. The GB & CB and GB & XGBoost combinations yielded good results with accuracies of 89%–91%. However, the GB & AB and BNB & GB combinations performed poorly with accuracies of 83.59% and 80.73%, respectively. Overall, combinations containing XGBoost consistently produced better results, especially when combined with ensemble approaches such as ET and RF.

Table 4.

Soft voting result of top 10 voting model combinations selected features.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 score (%)	ROC score (%)	AUC value (%)
GB & CB	89.43	89.71	89.43	89.39	89.29	94.88
ET & XGBoost	93.17	93.46	93.17	93.15	93.05	97.68
RF & XGBoost	93.28	93.59	93.28	93.26	93.15	97.48
CB & XGBoost	92.29	92.67	92.29	92.26	92.15	96.94
GB & XGBoost	90.97	91.3	90.97	90.94	90.83	95.92
GB & AB	83.59	83.97	83.59	83.51	83.42	90.47
AB & XGBoost	91.96	92.25	91.96	91.94	91.83	96.52
ET & CB	92.18	92.51	92.18	92.15	92.04	97.43
ET & GB	90.97	91.49	90.97	90.92	90.8	96.28
BNB & GB	80.73	81.03	80.73	80.64	80.56	88.76

ROC: receiver operating characteristic; AUC: area under the curve; BNB: Bernoulli Naïve Bayes; CB: CatBoost; ET: Extra Trees; GB: gradient boosting; RF: Random Forest; XGBoost: extreme gradient boosting.

In Figure 14, the bar chart depicts the accuracy scores of the top ten soft voting model combinations following feature selection. The best accuracy was 93.28% for the RF & XGBoost combination model, closely followed by that for the ET & XGBoost model (93.17%) and CB & XGBoost model (92.29%). Other combinations that performed well include ET & CB (92.18%) and AB & XGBoost (91.96%). The BNB & GB combination model performed the worst on the lower end, with an accuracy of 80.73%. The highest accuracy achieved in this configuration indicated that XGBoost and other ensemble models, specifically RF, ET, and CB work well together.

Figure 14.

Accuracy scores for the top 10 soft voting combination models with selected features.

Table 5 details the performance of the combined models, showing balanced results with consistently good precision, recall, and F1 score values in addition to accuracy. Both ET & XGBoost and RF & XGBoost combination models yielded the best performance, with precision, recall, and F1 scores of approximately 92.5%, demonstrating their resilience in accurately detecting both positive and negative situations. The CB & XGBoost combination continued to perform well, scoring approximately 91.6%. Poorer performance was exhibited by the GB & AB and BNB & GB combination models, especially in terms of recall and precision (both approximately 80.7%), indicating limited ability to accurately identify false positives or false negatives. Models such as ET & XGBoost and RF & XGBoost are not only accurate but also dependable for class differentiation, as indicated by their high ROC (92.46%) and AUC values, which are in line with these trends.

Table 5.

Top 10 hard voting ensemble model combinations using selected features.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)	ROC score (%)
GB & CB	88.55	88.55	88.55	88.55	88.54
ET & XGBoost	92.51	92.54	92.51	92.51	92.46
RF & XGBoost	92.29	92.32	92.29	92.29	92.24
CB & XGBoost	91.63	91.69	91.63	91.62	91.57
GB & XGB	88.99	89.02	88.99	88.99	89.02
GB & AB	80.73	80.77	80.73	80.73	80.75
AB & XGBoost	84.58	85.21	84.58	84.55	84.75
ET & CB	91.52	91.58	91.52	91.51	91.45
ET & GB	89.43	89.47	89.43	89.43	89.46
BNB & GB	80.73	80.76	80.73	80.73	80.75

ROC: receiver operating characteristic; CB: CatBoost; BNB: Bernoulli Naïve Bayes; GB: gradient boosting; AB: AdaBoost; ET: Extra Trees.

In Figure 15, the bar chart depicts the accuracy scores of the top ten hard voting model combinations following feature selection. The greatest accuracy was 92.51% for the ET & XGBoost combination model, closely followed by that for the RF & XGBoost model (92.29%) and CB & XGBoost model (91.63%). Other combinations, including AB & XGBoost (accuracy =84.58%) and ET & CB (accuracy = 91.52%) also performed well. The lowest accuracy (80.73%) was yielded by the GB & AB and BNB & GB models. Although the scores are marginally lower than those observed with soft voting, the results indicated that hard voting combinations incorporating XGBoost and ensemble methods such as ET, RF, and CB are among the most accurate.

Figure 15.

Accuracy scores for the top 10 hard voting combination models with feature selection.

Table 6 presents model performance and HPT outcomes, shedding light on how well various algorithms work. GB emerges as the top performance with a test accuracy of 92.84%, precision of 93.08%, and an F1 score of 92.82%, indicating good balance across measures. ET, CB, and XGBoost performed considerably well, with test accuracies of 92.18%, 92.51%, and 92.62%, respectively. The success of ensemble and distance-based approaches is demonstrated by the careful follow-up of RF and KNNs. LR, BNB, and AB all performed well, with accuracies ranging from 75% to 76%, while SVM lagged slightly with a test accuracy of 71.81%. The top cross-validation (CV) scores corresponded well to test accuracies, showing high model generalization, particularly for GB and ET.

Table 6.

Using hyperparameter tuning for ML models.

Model	Best parameters	Best CV score (%)	Test accuracy (%)	Precision (%)	Recall (%)	F1 score (%)
Logistic regression	C = 1; solver = lbfgs	74.31	75	75.24	75	74.88
Random forest	max depth = 20; min samples split = 2; n estimators = 100	88.48	91.63	91.97	91.63	91.6
Decision tree	criterion = entropy; max depth = None	83.74	87.22	87.58	87.22	87.17
SVM	C = 1; kernel = linear	71	71.81	72.11	71.81	71.62
K-nearest neighbors	n neighbors = 3; weights = distance	84.04	87.44	89.46	87.44	87.23
BNB	alpha = 0.1	74.17	75	75.26	75	74.87
Extra trees	max depth = 30; n estimators = 50	89.25	92.18	92.51	92.18	92.15
Gradient boosting	Learning rate = 0.2; max depth = 7; n_estimators = 100	89.44	92.84	93.08	92.84	92.82
AdaBoost	Learning rate = 1; n_estimators = 100	75.06	76.43	76.55	76.43	76.37
CatBoost	depth = 9; iterations = 200; learning rate = 0.1	89.33	92.51	92.79	92.51	92.49
XGBoost	Learning rate = 0.1; max depth = 9; n estimators = 100	88.92	92.62	92.83	92.62	92.6

SVM: support vector machine; XGBoost: extreme gradient boosting; BNB: Bernoulli Naïve Bayes; ML: machine learning; CV: cross-validation.

Table 7 presents a comparison of the performances of several model combinations following hyperparameter adjustment with soft voting. The best performances were exhibited by the ET & XGBoost and GB & CB models with slightly higher AUC values (97.59%) and comparable accuracy and F1 values of 93.39% and 93.38%, respectively. The CB & XGBoost model also exhibited outstanding performance with a strong AUC value of 97.32% and the greatest accuracy of 93.61%. Following closely, RF & XGBoost and GB & XGBoost achieved accuracy levels >93%. Accuracy is approximately 92% for combinations of GB & AB, AB & XGBoost, and ET & CB, with ET & CB having the highest AUC value of 97.68%. The BNB & GB model exhibited intermediate performance with an accuracy of 92.29% and a lower AUC value of 95.43%. Overall, ensembles with XGBoost and CB consistently performed well across measures, and our proposed model achieved the highest accuracy with it.

Table 7.

Hyperparameter tuning for the soft voting ensemble model.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 score (%)	ROC score (%)	AUC value (%)
GB & CB	93.39	93.62	93.39	93.38	93.28	97.42
ET & XGB	93.39	93.62	93.39	93.38	93.28	97.59
RF & XGB	93.06	93.28	93.06	93.05	92.95	97.27
CB & XGB	93.61	93.82	93.61	93.6	93.51	97.32
GB & XGB	92.95	93.21	92.95	92.93	92.83	97.25
GB & AB	92.07	92.38	92.07	92.04	91.94	96.45
AB & XGB	92.73	92.96	92.73	92.71	92.62	96.44
ET & CB	92.29	92.6	92.29	92.27	92.16	97.68
ET & GB	93.17	93.37	93.17	93.16	93.07	97.48
BNB & GB	92.29	92.54	92.29	92.27	92.17	95.43

ROC: receiver operating characteristic; AUC: area under the curve; CB: CatBoost; BNB: Bernoulli Naïve Bayes; GB: gradient boosting; AB: AdaBoost.

In Figure 16, the bar chart depicts the accuracy scores of selected soft voting model combinations following HPT. The highest accuracy was 93.61% for CB & XGBoost, closely followed by that for GB & CB (93.39%) and ET & XGBoost (93.39%). RF and XGBoost or RF & XGBoost ranked second, with accuracy of 93.06%. With accuracies exceeding 92%, other combinations such as ET & CB and ET & GB also performed well. The least successful of the chosen combinations was BNB & GB, which nevertheless achieved a reasonably high accuracy of 92.29%. With most combinations obtaining accuracies >92%, HPT further improved these models’ performances and showed the effectiveness of soft voting when paired with optimal parameters. Overall, ensembles with XGBoost and CB consistently perform well across measures, and our proposed model achieved the highest accuracy using it.

Figure 16.

Accuracy of the soft voting ensemble model after hyperparameter tuning.

We compared the performance of deep learning models across several parameters (Table 8). ANN surpassed the others, with an accuracy of 84.69%, precision of 85.28%, recall of 84.69%, and AUC value of 0.9028, suggesting excellent overall performance. The second-best performance was demonstrated by MLP, with an accuracy of 81.28%, precision of 82.64%, and AUC value of 0.9012. Both CNN and RNN performed marginally worse in terms of class distinction with accuracies of 77.75% and 78.30%, respectively, and AUC values of 0.8561 and 0.8486, respectively.

Table 8.

Performance of deep learning models.

Model	Accuracy (%)	Precision (%)	Recall (%)	AUC (%)
MLP	81.277531	82.635916	81.277533	0.901203
ANN	84.691632	85.282097	84.69163	0.902763
CNN	77.753305	79.827819	77.753304	0.856125
RNN	78.303963	79.491193	78.303965	0.848601

AUC: area under the curve; MLP: multilayer perceptron; ANN: artificial neural network; CNN: convolutional neural network; RNN: recurrent neural network.

The bar chart in Figure 17 depicts the accuracy of four deep learning models in a classification test. The most successful model in this comparison was the ANN model, which achieved the maximum accuracy of 84.69%. The MLP model performed well but were marginally less efficient than ANNs with an accuracy of 81.28%. With a reasonable accuracy of 78.30%, the RNN model worked well but did not perform as well as the MLP and ANN models. The CNN model achieved the lowest accuracy in this group (77.75%). Overall, the ANN and MLP models exhibited higher accuracy, making them the preferred option for this specific dataset. The CNN and RNN models exhibited only moderate classification efficacy.

Figure 17.

Accuracy of the deep learning models.

The performances of several ensemble model combinations, assessed using several metrics, including accuracy, precision, recall, F1 score, and AUC value as well as the accompanying standard deviations (SD) values, which show performance stability, have been summarized in the Table 9. With mean accuracy values ranging from 91% to 92%, all ensemble combinations performed well overall, suggesting consistent and trustworthy asthma risk prediction across models. As the most balanced and successful ensemble in terms of classification performance, the CB & XGBoost combination model achieved the highest mean accuracy (91.80%), precision (91.98%), recall (91.80%), and F1 score (91.79%). The ET & GB and ET & XGBoost models had the highest mean AUC values (96.95% and 96.91%, respectively) in terms of discriminative capacity as determined using AUCs. This implies that ensembles that use ET are especially effective at differentiating between cases with and without asthma. High model stability and robustness were demonstrated by the SD values across metrics, which are often low (usually <0.7). Notably, the ET & GB model exhibited remarkably low variability in the F1 score, accuracy, precision, and recall, indicating highly consistent performance throughout validation trials. In contrast, the BNB & GB model demonstrated the poorest overall performance, with the lowest AUC (94.02%) and mean accuracy and F1 score of approximately 90.4%. This suggests that BNB contributes less in ensemble settings than tree-based or boosting models. Thus, the table shows that tree-based and boosting ensembles, particularly those involving XGBoost, ET, and CB perform exceptionally well and consistently, making them ideal for precise and trustworthy asthma risk prediction.

Table 9.

Cross-validation results for soft voting ensembles (after hyperparameter tuning).

Model combination	Mean accuracy (%)	Standard accuracy (%)	Mean precision (%)	Standard precision (%)	Mean recall (%)	Std recall (%)	Mean F1 score (%)	Std F1 score (%)	Mean AUC (%)	Std AUC (%)
GB & CB	91.5783	0.579916	91.73825	0.619629	91.5783	0.579916	91.57036	0.578898	96.70765	0.305369
ET & XGBoost	91.73283	0.446052	91.87309	0.428513	91.73283	0.446052	91.72581	0.447766	96.90931	0.195989
RF & XGBoost	91.64458	0.752797	91.78134	0.752962	91.64458	0.752797	91.63768	0.753704	96.56531	0.229227
CB & XGBoost	91.79888	0.534669	91.97777	0.553527	91.79888	0.534669	91.79018	0.534658	96.67629	0.26869
GB & XGBoost	91.60038	0.354738	91.76424	0.398555	91.60038	0.354738	91.59223	0.353245	96.55996	0.256328
GB & AB	91.15953	0.510864	91.34652	0.586409	91.15953	0.510864	91.1498	0.50765	95.3012	0.709122
AB & XGB	91.42419	0.605085	91.5979	0.584264	91.42419	0.605085	91.4151	0.607129	95.69769	0.388505
ET & CB	91.31396	0.584645	91.51055	0.624196	91.31396	0.584645	91.30379	0.584026	96.88469	0.208386
ET & GB	91.15961	0.083115	91.35491	0.136119	91.15961	0.083115	91.14914	0.083297	96.94776	0.122886
BNB & GB	90.43205	0.266184	90.65327	0.263991	90.43205	0.266184	90.41904	0.26747	94.02273	0.59936

AUC: area under the curve; CB: CatBoost; BNB: Bernoulli Naïve Bayes; GB: gradient boosting; ET: Extra Trees; RF: Random Forest; AB: AdaBoost.

The distribution of accuracy scores across cross-validation folds for different combinations of soft voting ensembles after tuning is shown in Figure 18. Beyond single-point accuracy estimates, the violin plots provide insights into model stability by capturing both central tendency and performance variability. The CB & XGBoost voting classifier demonstrated consistent performance across folds among the assessed ensembles, with high median accuracy and a somewhat narrow distribution. Several other ensembles had lower median values or broader distributions, indicating greater sensitivity to data partitioning. The CB & XGBoost ensemble’s robustness and dependable generalization ability were demonstrated by the decreased variance. Overall, the findings showed that using soft voting to combine CB and XGBoost successfully strikes a balance between model diversity and predictive stability, making it a good option for asthma prediction tasks.

Figure 18.

Distribution of accuracy scores across CV folds for the tuned soft voting ensemble model. CV: cross-validation.

Figure 19 depicts the decision curve analysis, demonstrating the clinical utility of the CB & XGBoost voting classifier over a broad range of probability thresholds. The image illustrates how, over almost the whole threshold range (approximately 0.05–0.95), the CB & XGBoost model consistently generated a better net benefit than both “treat all” and “treat none” strategies. The net benefit of the treat-all technique rapidly decreased as the threshold probability rose, turning sharply negative at higher thresholds. It is unsuitable for clinical decision-making in asthma risk prediction due to the high likelihood of needless interventions and false positives. In contrast, the “treat-none” approach consistently maintained a net benefit of zero, indicating the lack of any therapeutic or predictive value. Crucially, even at higher thresholds, the CB & XGBoost ensemble model maintained a steady and positive net benefit, demonstrating its resilience and dependability when physicians need greater assurance prior to action. This implies that the suggested ensemble model can facilitate individualized, risk-based decision-making, providing a significant clinical benefit over non-model-based approaches. Overall, the DCA supports the CB & XGBoost ensemble’s applicability for real-world asthma risk prediction and decision assistance by confirming that it is both statistically reliable and clinically useful.

Figure 19.

Decision curve analysis for the CB & XGBoost ensemble model. CB: CatBoost; XGBoost: extreme gradient boosting.

In Figure 20, a calibration curve demonstrates that the observed fraction of asthma cases across probability bins agree with the projected probabilities produced by the CB & XGBoost voting classifier. Perfect calibration is shown by the diagonal reference line, and the no-skill model, which corresponds to the dataset’s overall asthma prevalence (0.50), is shown by the horizontal dashed line. In the calibration curve analysis, the baseline classifier known as the “no-skill” model shown by the dashed red line predicts a constant probability equal to the dataset’s prevalence of asthma cases. Thus, it provides every sample the same probability value regardless of the feature inputs. This baseline makes it possible to assess if the suggested ensemble models offer significant and trustworthy probabilistic predictions that go beyond straightforward prevalence-based estimation. Better alignment between expected probability and observed outcomes is indicated by the suggested models’ enhanced calibration compared with this baseline. The CB & XGBoost ensemble exhibited good calibration over the majority of probability ranges, as seen in the image, with estimated risks nearly matching the ideal diagonal line, especially at moderate to high predicted probabilities (≥0.6). For people with higher anticipated asthma risk, the model offers accurate and comprehensible risk estimates because the observed result frequencies in these areas closely match the expected values. The model showed slight underestimation and overestimation in some bins at lower probability ranges (<0.4), representing increased uncertainty when predicting low-risk cases, a phenomenon frequently observed in imbalanced clinical datasets. These variations, however, were still small and did not significantly depart from the no-skill baseline. Overall, the calibration analysis supports the CB & XGBoost ensemble’s applicability for risk-based clinical decision support by confirming that it not only achieves excellent discriminative performance but also generates well-calibrated probability estimates. These results show that the model provides reliable probability outputs and statistical accuracy, both of which are critical for real-world asthma risk prediction when considered in conjunction with the decision curve analysis.

Figure 20.

Calibration curve analysis for the CB and XGBoost ensemble model. CB: CatBoost; XGBoost: extreme gradient boosting.

Figure 21 shows the distribution of misclassification errors of the CB-XGBoost voting classifier by age and sleep quality. To examine error behavior inside the feature space, false positives (predicted asthma and actual non-asthma) and false negatives (predicted non-asthma and actual asthma) are highlighted. Misclassification errors were examined in relation to two clinically significant characteristics, age and sleep quality, to gain a better understanding of the prediction behavior of the suggested CB-XGBoost voting classifier. The geographical distribution of false positive and false negative predictions across the age–sleep quality feature space is shown in Figure 21. There is no discernible grouping in certain demographic or sleep-related locations, and the incorrectly classified samples are distributed throughout a broad range of ages and sleep quality levels. This implies that overlapping feature traits rather than systematic bias toward specific age groups or sleep quality extremes are the main cause of the observed mistakes. Notably, the model does not disproportionately misclassify people from a particular grouping because both false positives and false negatives are present throughout the feature space. Additionally, the lack of dense error concentrations suggests that the CB-XGBoost ensemble does not overfit to these individual characteristics, capturing nonlinear relationships between age and sleep quality. This pattern highlights the intrinsic complexity of asthma prediction from lifestyle and demographic characteristics while supporting the robustness of the suggested model. Beyond traditional performance measures, this kind of study offers more interpretability and transparency.

Figure 21.

misclassification error distribution by age and sleep quality for the CB-XGBoost voting classifier.

Discussion

Table 10 compares the performance of several asthma prediction models from previous studies with that of the suggested model, emphasizing variations in feature selection strategies, methodologies, and assessment criteria. Jeddi et al.⁶ achieved 84.9% accuracy, 87% precision, 88% recall, and 86% F1 score by using an RF model with chi-square feature selection. Murad et al.⁷ used a DT without feature selection to achieve an accuracy of 87%, and marginally lower precision (83%) and recall (80%). Joo et al.⁹ employed XGBoost for feature extraction, and although their AUC value was 93%, they achieved an 87.1% accuracy rate and a remarkable 97.9% recall. Vatsal et al.¹¹ achieved 90% accuracy and good overall performance using principal component analysis (PCA) and an ensemble approach; however, they were unable to achieve the AUC observed in the suggested study. Bose et al.¹² used XGBoost in conjunction with filter and embedding techniques, achieving a 95% precision rate but an 81% accuracy rate. Tomita et al.²¹ achieved 81% accuracy and AUC of 85% using XGBoost without feature selection. Gaudillo et al.²⁴ reported substantially performance, with an accuracy of 62.5% and poorer precision, recall, and F1 scores, when they combined RFE and RF with an SVM model.

Table 10.

Comparison of the proposed work with previous studies.

Reference	Feature selection	Method	Accuracy%	Precision%	Recall%	F1 score%	AUC%
Jeddi et al.⁶	Chi-square	Random Forest	84.9	87	88	86.00	×
Murad et al.⁷	×	Decision Tree	87	83.00	80.00	81.00	×
Joo et al.⁹	Feature extract	XGBoost	87.1	82.5	97.9	×	93.00
Vatsal et al.¹¹	PCA	Ensemble	90	89.00	87.00	91.00	×
Bose et al.¹²	Filter + embedded	XGBoost	81.00	95.00	82.00	88.00	×
Tomita et al.²¹	×	XGBoost	81.00	80.00	81.00	81.00	85.00
Gaudillo et al.²⁴	RFE + RF	RF + SVM	62.5	65.3	69.00	×	×
Proposed work	RFE + ETC	CB + XGBoost(HPT)	93.61	93.82	93.61	93.6	97.32

× indicates that no same data are used in the previous work.

AUC: area under the curve; RFE: recursive feature elimination; SVM: support vector machine; ETC: Extra Trees Classifier; CB: CatBoost; XGBoost: extreme gradient boosting; HPT: hyperparameter tuning; PCA: principal component analysis.

The suggested model outperformed previous models with an accuracy of 93.61%, precision of 93.82%, recall of 93.61%, F1 score of 93.6%, and AUC value of 97.32%, using RFE with ETC for feature selection and a CB-XGBoost ensemble with HPT. This improved performance shows how well RFE works when combined with a sophisticated ensemble and HPT, enabling the model to concentrate on the most important features while utilizing the advantages of both XGBoost and CB. The model is adjusted for optimal predictive performance by HPT optimization, which increases its resilience and adaptability to a range of asthma risk factors. This thorough methodology distinguishes the suggested model from previous models, offering a more precise, dependable and therapeutically useful tool for asthma prediction.

Strength of work

The reliability of this study’s asthma prediction model for clinical usage was demonstrated by its high accuracy, which reached a maximum accuracy of 93.61%. The approach employed four deep learning architectures and 12 ML models. Its rigorous HPT ensured optimal performance. The model successfully learnt from both majority and minority classes by resolving class imbalance using the SMOTE, increasing prediction accuracy for cases that were underrepresented. RFE and feature selection using ETCs improved the interpretability and efficiency of the model. The model’s predictions are straightforward and easy for clinicians to understand because of the addition of SHAP and LIME, which offer distinct insights about feature importance. Real-time access to predictions was enabled by a Flask-powered web interface, which increases ease of use for users and healthcare professionals. The model’s potential for widespread clinical use is further supported by its scalability and interoperability with current healthcare systems, making it a dependable, understandable and easily available tool for proactive asthma control.

Web interface

In Figure 22, the process of an asthma prediction system that uses a web interface linked to a Flask server and a ML model is depicted. The user initiates the process by inputting pertinent health data into the web interface, which transmits the information to the Flask server. Then, the server sends the data to a ML model that has been trained to process it and determine the probability of asthma. The outcome is transmitted back to the Flask server, which forwards it to the web interface when the prediction has been generated. Then, the user is shown a clear result (such as “Yes” or “No”) that represents the conclusion of the prediction. With this configuration, the user and the ML model can communicate easily and effectively while receiving real-time asthma risk evaluations.

Figure 22.

Workflow for the web interface.

Figure 23 depicts an “asthma prediction” web interface that includes a form-based input panel and matching output panels. The input form, which appears on both left and right panels, invites users to enter facts about asthma risk factors such as age, sleep quality, dust exposure, hay fever, chest tightness, coughing, and nighttime symptoms. One of two potential outcomes is shown in the middle panel after users enter their data and press the “predict” button. At that time, if a person is at high risk, the interface displays, “You have asthma” and shows a certain message. In other cases, the interface displays, “You are free from Asthma” and shows a certain message. This user-friendly interface provides clear advice based on the prediction outcome, assisting users in understanding their asthma risk and offering practical suggestions for reducing or managing it.

Figure 23.

Patient using the web interface.

Interpretability

In Figure 24, the SHAP waterfall plot shows the contribution of each feature to the final prediction of the model with an output of f(x) = 2.814 and a base value of (f(X)) = 0.205. The model output is pushed upward by important features such as dust exposure (+0.41), age (+0.54), and sleep quality (+0.28), indicating that these factors are linked to a higher chance of the expected result. The prediction score was considerably lowered by coughing (−0.765) and chest tightness (−0.734), which suggests a lower chance of obtaining the expected result when these symptoms are present. Additionally, the unfavorable effect of nighttime symptoms is less but still noticeable (−0.49). This plot highlights that the final prediction results from a complex balance where positive factors such as age and dust exposure counteract the reductions caused by negative symptoms such as coughing and chest tightness, ultimately arriving at a prediction score of 2.814. This visual provides insight into the model’s reasoning by showing the individual impact of each feature on the prediction.

Figure 24.

SHAP waterfall plot illustrating feature contributions to model prediction. SHAP: Shapley additive explanations.

Figure 25 presents a series of SHAP decision plots, which begin with a base value and progress to final predictions of 2.47, 5.78, 2.81, and 3.19, indicating how particular features affect the model’s prediction scores over different occurrences. Features such as coughing and nighttime symptoms are often identified as important determinants in each instance, either enhancing or reducing the prediction based on how they interact with other variables. Sleep quality and age usually exert favorable effects, raising the score, particularly in higher predictions such as 5.78. In contrast, hay fever and chest tightness frequently lower the score, indicating their association with a decreased chance of obtaining the expected result. Features such as dust exposure and coughing tend to exert more positive than negative contributions in high-prediction situations, demonstrating how the model balances numerous feature influences to improve its output. By showing which elements increase or decrease predictions and demonstrating the complex, context-dependent relationships between variables, these plots provide a thorough understanding of the model’s decision-making process.

Figure 25.

SHAP decision plot illustrating feature impact on model prediction. SHAP: Shapley additive explanations.

Figure 26 presents a bar chart of mean SHAP values for each feature, which indicate feature relevance and represent its average contribution to the model’s predictions. Age was the most significant component with the greatest mean SHAP value (1.01), potentially exerting a significant impact on the prediction outcome. Chest tightness and coughing ranked second and third, respectively, with SHAP values of 0.89 and 0.93, respectively, indicating that respiratory symptoms exerted a major influence on the model’s predictions. Dust exposure (0.77) and nighttime symptoms (0.86) both made significant contributions, indicating that predictions are influenced by both environmental conditions and symptom severity. The relatively lower SHAP scores for hay fever (0.48) and sleep quality (0.68) suggest that they exert a moderate to slight impact on the output. This feature ranking based on SHAP values provides insights into the main elements influencing the model’s predictions, indicating that features such as age and coughing are given the most importance, while factors such as hay fever exert a smaller effect.

Figure 26.

Mean SHAP values highlighting key predictive features. SHAP: Shapley additive explanations.

Figure 27 depicts a prediction analysis for asthma with the model assigning a 92% probability for asthma and an 8% likelihood for no asthma. The key factors that influence the predictions are shown on the right with chest tightness (−0.73), hay fever (−0.43), coughing (−0.77), nighttime symptoms (−0.94) being major contributors to asthma prediction (highlighted in orange). Sleep quality (−1.12), dust exposure (−0.56), and age (−0.76) displayed in blue, made a small contribution to the non-asthma prediction but were overshadowed by the asthma-indicating variables. This analysis demonstrates that indications of respiratory discomfort such as chest tightness and coughing significantly enhance the chance of asthma in this model with sleep quality and age playing minor roles in moderating this outcome.

Figure 27.

LIME explanation for asthma prediction. LIME: local interpretable model-agnostic explanations.

Future work

In subsequent studies, we intend to improve the generalizability and accuracy of our asthma prediction model across a range of individuals by integrating more diversified datasets from various geographical locations and healthcare environments. This improved model would be able to capture a greater range of influences on asthma by considering other risk factors such as lifestyle, genetic, and environmental variables, including pollution levels and climatic data, allowing more individualized predictions. Applying the model to longitudinal data will allow us to monitor patient changes over time and improve its capacity to predict asthma exacerbations and progression for better ongoing care. Clinicians will have greater faith in the model’s results if interpretability techniques are improved with the use of more advanced tools that offer more precise insights into the variables influencing each prediction. Additional real-time processing optimization would make integration with EHR systems easier, enabling effective clinical use. In order to help patients and physicians use the tool for asthma monitoring and management, the proposed model also intends to enhance the online interface to make it more interactive and adaptable. Finally, prospective clinical validation, multi-center external testing, and integration into healthcare information systems for practical evaluation should be the main topics of future study.

Clinical interpretation and explainability consideration

The suggested asthma prediction models were made more transparent and easier to understand by using SHAP and LIME as post-hoc explainability tools. These techniques were employed to find consistent patterns of feature influence across individual and global forecasts rather than obtain conclusive clinical reasons. Age, sleep quality, dust exposure, hay fever, chest tightness, coughing, and nighttime symptoms were consistently identified by SHAP as major contributors to asthma risk across several correctly categorized cases, indicating a consistent and clinically credible explanation pattern. These elements support the face validity of the model’s decision logic by being consistent with recognized asthma risk indicators and symptom-based clinical evaluations. From the standpoint of clinical decision support, these explanations could help clinicians by achieving the following: (a) emphasizing the main risk factors for specific patients; (b) supporting risk stratification; and (c) promoting patient–clinician communication by offering understandable justification for forecasts. However, rather than as strict clinical guidelines, the explanations should be seen as encouraging insights. One disadvantage of the current investigation is the absence of direct clinician input or formal quantitative stability analysis of SHAP/LIME explanations. To improve clinical application, future research will include longitudinal evaluation of explanation robustness, explanation consistency measures across repeated sampling, and clinician-in-the-loop validation.

Conclusions

The present work offers a thorough framework for predicting asthma that makes use of deep learning and ML models to produce precise, understandable, and easily accessible predictions. A strong model that could accurately predict asthma risk was created by resolving class imbalance using the SMOTE, and selecting features. Among all the evaluated approaches, the hyperparameter-optimized CB and XGBoost ensemble (soft) demonstrated the best performance, achieving the highest accuracy of 93.61%, thereby confirming the effectiveness of the proposed ensemble learning framework for asthma risk prediction. Clinicians were able to trust the model’s predictions and comprehend the significance of features to the improved interpretability made possible by the usage of SHAP and LIME. Furthermore, the creation of an intuitive web interface enhances accessibility and usefulness by enabling medical professionals to use the model in real-time situations. This study has certain limitations that should be noted despite the positive outcomes. Despite the good predictive accuracy of the suggested model, it is important to recognize certain dataset-related constraints. The relatively limited number of patient records in the dataset, which was obtained from Kaggle, may have limited the model’s capacity to generalize across a variety of demographics. Internal CV was performed; however, independent datasets were not used for external validation. In order to evaluate the robustness and external performance of the suggested models, future research should concentrate on applying them to multi-center or real-world clinical data. The generalizability and therapeutic usefulness of the suggested approach will be further strengthened by addressing these shortcomings in further research using bigger, more varied datasets and external testing. Overall, this research improves asthma prediction tools and provides a useful solution that encourages proactive and individualized patient care. It has the potential to greatly enhance asthma management and healthcare effectiveness.

Footnotes

Acknowledgments

We acknowledge the suppliers of the publicly accessible dataset utilized in this investigation as well as AI (ChatGPT and Grammarly) methods, which were only utilized for grammatical correction and language refining.

CRediT author statement

Md Mahbubur Rahman Druvo: Conceptualization, methodology, and software; Ashfaqul Islam: Validation, visualization, and writing–review & editing. Abir Chowdhury: Resources, formal analysis, and writing–original draft; Khandaker Mohammad Mohi Uddin: Supervision, validation, methodology, and writing–original draft.

Consent to participate

Since this study did not include direct human participant interaction and used a publicly accessible anonymized dataset, participation consent was not required.

Data availability

The data supporting this study’s findings are available from the corresponding author upon reasonable request.

Declaration of conflicts of interest

The authors declare that there are no conflicts of interest.

Ethical approval

Since this study did not include direct human participant interaction and used a publicly accessible anonymized dataset (Kaggle Asthma Dataset 2021), ethical approval was not necessary.

Funding

None.

ORCID iD

Khandaker Mohammad Mohi Uddin

References

Sockrider

Fussner

What is asthma?

Am J Respir Crit Care Med 2020; 202: P25–P26.

Krishnan

Lemanske

Jr Canino

, et al. Asthma outcomes: symptoms. J Allergy Clin Immunol 2012; 129: S124–S135.

Nocon

Booth

The social impact of asthma. Fam Pract 1991; 8: 37–41.

Wechsler

Castro

Lehman

; NHLBI Asthma Clinical Research Networket al. Impact of race on asthma treatment failures in the asthma clinical research network. Am J Respir Crit Care Med 2011; 184: 1247–1253.

Zhang

Paré

Sandford

AJ.

Recent advances in asthma genetics. Respir Res 2008; 9: 4.

Jeddi

Gryech

Ghogho

, et al. Machine learning for predicting the risk for childhood asthma using prenatal, perinatal, postnatal and environmental factors. Healthcare (Basel) 2021; 9: 1464.

Murad

Adhikary

Muzahid

AJM

, et al. AI powered asthma prediction towards treatment formulation: an android app approach. Intelligent Automation & Soft Computing 2022; 34: 87–103.

Abhadiomhen

Zhou

, et al. Asthma prediction via affinity graph enhanced classifier: a machine learning approach based on routine blood biomarkers. J Transl Med 2024; 22: 100.

Joo

Lee

, et al. Increasing the accuracy of the asthma diagnosis using an operational definition for asthma and a machine learning method. BMC Pulm Med 2023; 23: 196.

10.

Xie

Predicting the risk of asthma development in youth using machine learning models. PloS One 2025; 20: e0336591.

11.

Vatsal, Kumar S, Riyaet al. Advanced ensemble learning approach for asthma prediction: optimization and evaluation. In: 2024 International Conference on Automation and Computation (AUTOCOM), India, Dehradun, March 2024, pp. 283–288.

12.

Bose

Kenyon

Masino

AJ.

Personalized prediction of early childhood asthma persistence: a machine learning approach. PloS One 2021; 16: e0247784.

13.

Kothalawala

Murray

Simpson

; STELAR/UNICORN investigatorset al. Development of childhood asthma prediction models using machine learning approaches. Clin Transl Allergy 2021; 11: e12076.

14.

Finkelstein

Jeong

IC.

Machine learning approaches to personalize early prediction of asthma exacerbations. Ann N Y Acad Sci 2017; 1387: 153–165.

15.

Ravi

Lokesh

, et al. A prognostic model to improve asthma prediction outcomes using machine learning. Open Bioinform J 2024; 17: e18750362306414.

16.

Lugogo

DePietro

Reich

, et al. A predictive machine learning tool for asthma exacerbations: results from a 12-week, open-label study using an electronic multi-dose dry powder inhaler with integrated sensors. J Asthma Allergy 2022; 15: 1623–1637.

17.

Hussain

Shah

Mukherjee

, et al. Predicting the risk of asthma attacks in children, adolescents and adults: protocol for a machine learning algorithm derived from a primary care-based retrospective cohort. BMJ Open 2020; 10: e036099.

18.

de Hond

Kant

Honkoop

, et al. Machine learning did not beat logistic regression in time series prediction for severe asthma exacerbations. Sci Rep 2022; 12: 20363.

19.

Luo

Stone

Fassl

, et al. Predicting asthma control deterioration in children. BMC Med Inform Decis Mak 2015; 15: 84.

20.

Gunawardana

Viswakula

Rannan-Eliya

, et al. Machine learning approaches for asthma disease prediction among adults in Sri Lanka. Health Informatics J 2024; 30: 14604582241283968.

21.

Tomita

Yamasaki

Katou

, et al. Construction of a diagnostic algorithm for diagnosis of adult asthma using machine learning with random forest and XGBoost. Diagnostics (Basel) 2023; 13: 3069.

22.

AlSaad

Malluhi

Janahi

, et al. Predicting emergency department utilization among children with asthma using deep learning models. Healthcare Analytics 2022; 2: 100050.

23.

Moraes

Dai

, et al. Early prediction of pediatric asthma in the Canadian Healthy Infant Longitudinal Development (CHILD) birth cohort using machine learning. Pediatr Res 2024; 95: 1818–1825.

24.

Gaudillo

Rodriguez

JJR

Nazareno

, et al. Machine learning approach to single nucleotide polymorphism-based asthma prediction. PloS One 2019; 14: e0225574.

25.

Shannu

AI.

Asthma_Dataset_2021, Comprehensive Health Data for Asthma Diagnosis in Indian Patients, https://www.kaggle.com/datasets/shaannuai/asthma-dataset-2021, 2021.

26.

Tantisira

, et al. Genome wide association study to predict severe asthma exacerbations in children using random forests classifiers. BMC Med Genet 2011; 12: 90.

27.

Prokhorenkova

Gusev

Vorobev

, et al. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems 2018; 31.

28.

Badnjević

Pokvic

Cifrek

, et al. (2016, May). Classification of asthma using artificial neural network. In: 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, May 2016, pp. 387–390.

29.

Chatzimichail

Paraskakis

Rigas

Predicting asthma outcome using partial least square regression and artificial neural networks. Advances in Artificial Intelligence 2013; 2013: 1–7.

30.

Porat

Friedlander

Performance analysis of parameter estimation algorithms based on high‐order moments. International Journal of Adaptive Control and Signal Processing 1989; 3: 191–229.