Abstract
Objective
Mother’s health risks should be identified early so that the outcome of the pregnancy can be enhanced and the complications experienced during pregnancy can be minimized. This paper will design and test a leakage-regulated hybrid machine learning model to predict maternal health risk using the optimized ensemble models.
Methods
It used the publicly available UCI Maternal Health Risk dataset (n = 1015). A fixed random seed was used to stratify the dataset to 80% training and 20% independent testing subsets (42). SMOTENN resampling was only done to the training data to avoid data leakage. Internal cross-validation was resorted to as a means of hyperparameter tuning. We came up with the optimized XGBoost, blending, and hybrid stacking models. The performance of a model was measured in terms of accuracy, precision, recall, F1 score, ROC-AUC, confusion analysis, and probability mean squared error (Brier score).
Results
The hybrid stacking model had a ROC-AUC of 0.911 and general accuracy of 80 percent over the independent test set. The model proved to be very sensitive with high-risk cases (recall = 0.85). The probability mean squared error (Brier score) was 0.07, which is good probability calibration. The hybrid framework proposed performed better in terms of discriminative capability as compared to baseline models (logistic regression, random forest, and SVM).
Conclusions
The suggested leakage-sensitive hybrid ensemble framework offers strong and clinically significant working outcomes on maternal health risk forecasting. The results show the significance of effective validation techniques and probabilistic evaluation measures in healthcare machine learning systems.
Keywords
Introduction
The healthcare industry has been greatly impacted by the quick development of technology, especially with regard to maternity care. This paper will investigate the use of blending models and modified XGBoost to achieve enhanced maternal health risk identification.
1
These cutting-edge methods offer a priceless new strategy for the early detection of possible problems, giving medical personnel the knowledge they need to improve patient outcomes.
2
Join us as we explore the thinking behind these innovative models and talk about the consequences of using them in contemporary contexts for maternal healthcare. It is essential to recognize the critical role that maternal health plays in the welfare of both the mother and the child. Pregnancy problems during childbirth can be greatly decreased with early diagnosis of high-risk pregnancies, protecting the mother’s and the child’s lives.
3
Machine learning algorithms have demonstrated encouraging outcomes in the past several years when it comes to forecasting various medical applications,
4
including predicting high-risk pregnancies and improving maternal health outcomes. The goal of this project is to use modified XGBoost and blending models to create an enhanced maternal health risk identification system. Through the use of data preprocessing methods such as binarization, label encoding, and the SMOTE-EN approach, our model seeks to increase prediction accuracy and make it easier to detect possible high-risk maternal cases. The contributions of the study include the following:- • Data pre-treatment techniques such as label encoding and label binarization to manage categorical variables and enable the use of machine learning algorithms for risk prediction. • Utilization of SMOTENN for data resampling and addressing unbalanced datasets in maternal health risk assessment. • Design a feature importance model to identify essential risk variables and detect maternal health hazards effectively. • Development of an improved XGBoost classification model with hyperparameter tuning to enhance the accuracy of detecting maternal health risks. • New Blending Ensemble Prediction Model that combines the predictions of multiple base classifiers to improve predictive performance and reliability in detecting maternal health risks.
More and more structured analytical models have been used on large-scale real-world health data to assist in predictive modeling and surveillance of healthcare systems. Indicatively, Nasehi et al.. 5 Have also illustrated how a CRISP-DM-based analytical pipeline can be used in national outpatient prescription surveillance and that such standardized data-driven approaches are important in healthcare analytics. The requirement of transparent and reproducible machine learning architectures in clinical risk prediction settings is demonstrated through such types of structured modeling strategies. We aimed to improve pregnancy care by building an improved maternal health risk detection system that integrated modified XGBoost and blending models with data preprocessing approaches to enable early diagnosis and intervention for high-risk pregnancies. This method has enormous promise for saving countless lives and improving mother and newborn health outcomes. The structure of this paper will start with an introduction to explain our study ideas and follow with section two, which includes the literature studies comparing with this one. The methods and materials of our study will be clear in section three. Furthermore, we will discuss and explain our findings and outcomes in section four, and finally, we conclude and refer to our study in section five.
Related studies
We will highlight some previous studies related to this research. Classification algorithms of maternal risk detection were developed by Muhlis Tahir et al. The goal of this study was to examine the outcomes of two algorithms, deep learning and neural network, to predict the risk of preeclampsia in expectant mothers over the course of their pregnancy. The feature selection approach that we employ is particle swarm optimization (PSO). 6 PSO may minimize the number of traits from 17 to 9 attributes, as demonstrated by this experiment. Based on the smaller dataset, deep learning produced results with an accuracy of 95.12% and a quicker execution time, according to the original data. In this paper they used some AI algorithms but in normal form, which shows the difference from our study. We developed new novel machine learning algorithms to be in advanced hybrid form by combining different types of algorithms that supported us with better results and higher accuracy related to the previous ones.
Machine learning approaches developed by Akhan Akbulut et al. 1 to predict the fetal health status based on the clinical history of the mother. In this study, they created a prediction system with assistive e-Health applications that practitioners and expectant mothers could utilize. 89.5% was the greatest prediction accuracy recorded in this paper’s development testing using the Decision Forest model. 16 individuals participated in real-world testing, and the performance was 87.5%. This approximation is adequate to provide a sense of the health of the fetus before the patient sees the doctor. In this study they used some machine learning algorithms, and they got lower accuracy and system performance than ours.
Vibhakar Mansotra and Sourabh Shastri developed data mining probabilistic classifiers to extract knowledge from maternal health datasets in Jammu and Kashmir State, India. 7 The study used the Health Management Information System (HMIS) to categorize districts into high MMR and low MMR, using Bayesian TAN and Naïve Bayes classifiers from 2014 to 2018. Comparing with our models, we designed better AI models than those previously developed, including hybrid advanced techniques.
KS Betts et al. used the methods of prediction of usual postpartum issues in mothers. Comparison of the performance of the models involved five-fold cross-validation by utilizing gradient-boosted trees. The top-performing models across the outcomes in the independent validation data were then considered with the area under the receiver operating curve (AUC-ROC). 8 Obstetric surgical wound infection (AUC = 0.856, 95% CI 0.8380.873) and postpartum hypertensive disorders had a strong measure of discrimination whereas postpartum sepsis and hemorrhage had poor measures of discrimination in the independent validation data.
The application of machine learning algorithms to predict the fetal risk is suggested by Hoodbhoy et al. This study set out to investigate the accuracy of machine learning algorithm strategies in detecting high-risk pregnancies using CTG data. 9 The University of California, Irvine Machine Learning Repository provided the CTG data for 2126 pregnant women. CTG data were used to train ten distinct machine learning classification models. With a prediction accuracy of 93%, the classification model created with the XGBoost approach had the highest. By looking through this paper, we saw that they used the XGBoost algorithm without hyperparameter tuning, while we applied advanced XGBoost in an advanced form, and we got the highest accuracy in our model.
Matthew K. Hoffman et al. developed a machine learning readmission prediction model of the mothers. 10 Designing and certifying a machine learning prediction model of postpartum readmission of mothers with complications as a result of hypertensive disorders of pregnancy (HDP). The performance characteristics of the test in terms of derivation cohort AUC=0.85 and the validation cohort AUC=0.81) and demographics were comparable across the two cohorts. It was found that 31 clinical characteristics, used in the derivation and validation methods were very predictive.
A Study of the Pregnancy Risk Assessment Monitoring System in the United States: Data generated by Kenesha Smith Barber et al.. 11 This study used a sizable, population-based sample of American women to assess the association between postpartum depression symptoms and having a preterm delivery. They observed significant statistical significance in the association between mother hopelessness and having a preterm birth for both very preterm and extremely preterm births (moderate-to-late preterm OR, 1.19).
Using ensemble learning-based feature engineering, Ali Raza et al. propose analyzing maternal health risks. 12 This project aims at developing a system that will apply artificial neural networks to detect maternal health hazards through examination of medical data. DT-BiLTCN is a revolutionary structure of the deep neural network that involves the utilization of the temporal convolutional network, the bidirectional long short-term memory network, and decision trees. Synthetic minority over-sampling technique is used to overcome the problem of class imbalance. The result of the support vector machine in the present case is 98 percent rate of accuracy, whereas DT-BiLTCN provides a feature set to achieve high-accuracy results.
By applying Jean N.'s machine learning models, the health risks associated with pregnancy are predicted. 13 The four stages of their applied approach include modeling, comparative analysis, hyperparameter tweaking, and data processing. The following machine learning models—LR, KNN, SVM, ANN, CART, RF, GBM, XGB, Light GBM, and CatBoost—were used to forecast potential health hazards associated with pregnancy. With 88% accuracy, the LightGBM and CatBoost algorithms produced the best prediction value. RF algorithms had an accuracy of 86% and CART 87%, respectively.
According to the study Prediction of Maternal Health Risk with Traditional Machine Learning Methods by Hursit Burak MUTLU et al., 3 machine learning techniques can be useful in assessing the health risks associated with mothers. Consequently, the maternal health risk in this study was determined using six distinct machine learning techniques. Researchers compared the results of these approaches and determined that the decision tree approach was the most effective in predicting maternal health risk. 89.16% was the accuracy number attained using the decision tree approach.
Sultana Akter and Md Nurul Raihen 14 propose a comparative assessment of numerous effective machine learning classifications to determine maternal health risks. The given machine learning algorithms that play an important role in the assessment of maternity health risks, including LDA, QDA, KNN, decision tree, random forest, and bagging as well as support vector machine. They made use of 214 observations in testing and 800 in training during the split validation procedure implementation. The most reliable model was also determined using a cross-validation method that was carried out 10 times. The support vector machine using 10-fold cross-validation procedure has an accuracy score of 86.13% which makes the proposed model more accurate and efficient in comparison with all other models.
The article by Chaturvedi, Ramesht, et al. discusses the use of kernels for support vector machinesing for maternal health risk analysis. 15 Additionally, machine learning works effectively in the medical field. Through the delivery of medical information, machine learning enables us to comprehend the circumstances of patients. It also aids in the classification of illnesses, the detection of abnormalities, the prediction of future outcomes for disease treatments, and other tasks. They used our support vector machine model for risk analysis related to maternal health in this article. This model’s testing accuracy is 72.906 percent.
According to the previous studies that we presented above, we saw that most of these studies implemented their research in normal form, and they got good results, but in this study we presented the best models to deal with diagnosis and detection problems in medical fields, especially with maternal risk and imbalanced datasets. We had got high accuracy related to our advanced models that we used, including better preprocessing strategies, including SMOTEENN, which consists of two resampling methods working together to solve unbalanced samples, and used feature importance as a threshold to choose the best subset of features. In addition, we implemented advanced XGBoost and blending models to get better results than the previous one, and to increase the system performance, we used a hybrid of XGBoost and Adaboost with a genetic algorithm that was used to choose the better parameter for the hybrid model.
Study methods
In order to evaluate maternal health risks and create prognostic or diagnostic models utilizing cutting-edge machine learning techniques, this medical investigation employs a retrospective cohort study design. Researchers can assess outcomes like mortality or quality of life by analyzing data from historical medical records using a retrospective method.
Model evaluation strategy
The evaluation of the model was done by utilizing a stratified 80/20 train/test split with a random fixed seed (42). The 5-fold stratified cross-validation was employed as the hyperparameter optimization on the training set. Estimation of confidence intervals of ROC–AUC was done through repeated cross-validation resampling. Logistic regression as well as random forest and support vector machine are examples of baseline classifiers that were trained and tested under the same preprocessing and validation conditions to promote equal comparisons.
Data collection
The dataset attributes information.
Preprocessing workflow and leakage control strategy
We gathered different formats and attributes of raw data from various sources. We assigned a distinct integer value to each category to carry out data labeling using label encoding for categorical characteristics. A fixed random seed was used to first stratify the dataset into 80% training and 20% independent testing subsets (42). SMOTE resampling was implemented on the training set only after the division to avoid data leakage. The preprocessing and model development did not have any interference with the test set. Internal cross-validation, which was based solely on the training data, was used to select features and tune hyperparameters. This is a leakage-controlled workflow that has been used to guarantee unbiased performance assessment. Based on a threshold value, binarization was used to transform continuous variables into binary values. 17 Due to the unbalanced nature of maternal health data, we used the Synthetic Minority Over-sampling Technique for Editing Noisy Instances (SMOTENN) to rebalance our dataset. 18 The approach has been created by Batista et al. (2004) and it involves the utilization of SMOTE to generate artificial instances of the minority group and ENN to eliminate some of the observations of both groups that are identified to belong to a different category than the K-nearest neighbor majority category. 19 . The resulting balanced dataset is produced by oversampling instances of minority classes and undersampling instances of majority classes using this procedure.
Figure 1 shows the dataset outcome label before preprocessing and the use of the SMOTEENN technique, and in Figure 2 we presented the results of the label after the use of this technique. The distribution of our outcome label according to the counter (‘high risk’: 486, ‘low risk’: 325) before the resampling process and after the resampling and cleaning process was (‘high risk’: 486, ‘low risk’: 486). The predicted label distribution before SMOTENN. The predicted label distribution after SMOTENN.

The class distribution before and after SMOTENN.
The results of thresholds method.
The blending model
An ensemble learning subfield of machine learning combines multiple models of identical or different kinds to enhance the overall effectiveness of the model. One of the many ensemble machine-learning techniques is blending, which applies a machine-learning model to decide the best way to combine predictions of a number of ensemble member models,
22
and which is similar to a stacked model in some respects. Most ensemble learning models and blending models are constructed based on a meta-model or level-1 model, which is a combination of the predictions of the base models with two or more base models, also known as level-0 models.
23
The meta-model or the main model is trained using predictions of the base models on the non-sample data. Our model is an advanced form of blending that was created on the basis of the random forest, k-Nearest Neighbors classifier, and logistic regression algorithms to enhance the model efficiency.
24
Figure 3 showed the form of the blending model. The structure of the blending model.
Genetic algorithm (GA)
Most evolutionary algorithms consist of adaptive heuristic search algorithms, or genetic algorithms (GAs). Genetic algorithms are based on natural selection and genetics.
25
These are brilliant applications of the sporadic searches that are assisted by historical data to narrow the search to those parts of the solution space where there are greater performances. They are regularly utilized to offer superior solutions of search and optimization problems. The GAS works especially well in complicated search areas where more conventional optimization techniques might not be able to perform well.
26
A genetic algorithm may be used to optimize XGBoost model parameters to determine the best collection of hyperparameters for maximizing model performance. We used this technique with our hybrid model of XGBoost and Adaboost models to increase the model performances. Figure 4 describes the structure of a genetic algorithm. The structure of a genetic algorithm (GA).
27

Hybrid advanced model
Boosting is the process of combining a group of weak learners to create a strong learner. XGBoost will repeatedly produce a collection of weak models on sections of the data, with each weak prediction being weighted based on how well the weak learner did. We provided a hybrid model in advanced form, including XGBoost and Adaboost algorithms, with the use of grid search and a genetic algorithm to set the best subset of parameters. 18 To provide a more reliable and task-specific ensemble model, the hybrid approach combines these approaches to maximize the benefits of XGBoost, AdaBoost, and genetic algorithms. To ensure a hybrid approach’s success and prevent overfitting, it is necessary to meticulously develop and evaluate the method. 24 However, as these methods can be computationally demanding, particularly when used in a hybrid framework, consideration should also be given to temporal complexity and computational resources.
The results
The results of XGBoost model.

The confusion matrix of (XGBoost) results.
Confusion matrix table of hybrid model.

The confusion matrix of hybrid model.
The results of advance blending model.

The ROC-Curve of blending model.
Moreover, we used a hybrid model in an advanced form, including XGBoost and Adaboost, with an applied genetic algorithm GA to choose the best parameter of the hybrid model with Bayesian grid search. The results used Prob-MSE (Brier) were 0.07 and the ROC-AUC 91%, which are presented in Figure 8. Also, Table 7 presented the comparison results of our models. The Prob-MSE (Brier) of the hybrid model. The comparison results of our models.
The tuned XGBoost model using Bayesian hyperparameter optimization has chosen the following settings: max tree depth = 8, learning rate = 0.078, subsample ratio = 0.737, column subsampling rate = 0.865, L1 regularization (0.96) and L2 regularization (7.54), 255 boosting estimators, child minimum weight = 6, and gamma = 0.50.
As part of the suggested stacking framework and the neural network model, the hybrid ensemble showed excellent performance in terms of discrimination. The hybrid model on the independent test set (n = 305) gave an ROC AUC of 0.911, which represents excellent separation of low- and high-risk maternal cases. The total classification accuracy was 80%, and the F1-score (macro-average) was 0.80, which represents a class-balanced predictive performance.
Evaluation based on classes revealed that the model had a precision of 0.89 and recall of 0.77 with the low-risk group and 0.71 and recall of 0.85 with the high-risk group. Interestingly, the increased recall of the high-risk category shows that it is sensitive to detect at-risk maternal cases, which is clinically desirable in the screening application.
These findings indicate that the stacking-based hybrid framework gives a strong and clinically significant predictive accuracy to maternal health risk prediction.
Discussion
In the current research, a hybrid ensemble framework with leakage control was created and assessed as a predictive model of maternal health risks based on optimized strategies of XGBoost and stacking. The suggested strategy had good discriminative performance on an independent test set (ROC-AUC = 0.911) with equal predictive performance by classes as well as clinical significance of high-risk cases (recall = 0.85). Such results show that well-validated ensemble learning models may be helpful in terms of maternal risk stratification based on the structured healthcare data.
The optimized XGBoost model (ROC–AUC = 0.90) had a better discriminative capacity as opposed to baseline models (random forest, SVM, and logistic regression), but the hybrid stacking model had a stronger overall predictive strength. The progress of the stacking framework implies that when the heterogeneous learners are aggregated, the meta-model is able to combine the complementary predictive patterns that are enshrined by the gradient boosting and neural network representations. Significantly, despite a total accuracy of 80, the model showed a greater recall in high-risk pregnancies, which is especially laudable in screening plans where reduction of false negativity is of clinical importance. Good calibration of each predicted probability is suggested by the probability mean squared error (Brier score = 0.07). This implies that the model is not only discriminative in categories of risks but also has effective probabilistic estimates, which are fundamental in clinical decision-support systems. In the case of digital health applications, discrimination, as well as calibration, is required to guarantee reliable implementation. This research methodologically focuses on the need to avoid data leakage in machine learning research in healthcare. We have provided unbiased evaluation through the performance of stratified splitting before SMOTENN resampling and hyperparameter tuning in the training data only. The high accuracy values reported in the previous literature usually do not clearly specify the leakage control strategies. This issue is taken into consideration in the revised pipeline used in the study and increases reproducibility.
In comparison to the prior maternal health risk prediction studies, which have reported accuracies of 86 to 91 percent, our findings reveal similar or better discriminative findings whilst being methodologically transparent. This work uses validation rigor and clinical interpretability instead of the inflated values of accuracy. There has been an accelerating use of structured analytical frameworks on large-scale real-world health data to aid predictive modeling and surveillance of a healthcare system. As an illustration, Nasehi et al. proved the usefulness of national healthcare analytics based on standardized CRISP-DM-based pipelines. Equally, population-level predictive analysis based on health insurance data has been used in the medication safety and pharmacoepidemiological surveillance setting. These works highlight the greater translational nature of frameworks of structured machine learning in digital health ecosystems. All in all, the results suggest the possibility of using ensemble-based maternal risk prediction systems and emphasize the necessity of integrating discrimination measures, calibration evaluation, and leakage-controlled validation methods in clinical machine learning studies.
Along with the prediction of maternal health risks, structured machine learning structures have been effectively used in population-level health analytics. The recent findings based on the real-life data about health insurance proved to be useful in medication safety assessment and pharmacoepidemiological monitoring. 28 This result highlights the larger translational capabilities of standardized predictive modeling pipelines within digital health ecosystems. The present research is part of this paradigm because it applies a leakage-sensitive ensemble model that complies with reproducible standards of analysis. 29
The predictive variables in the model are clinically relevant in monitoring the health of the mother. Blood pressure parameters, such as systolic and diastolic blood pressure, are already known predictors of pregnancy complications (e.g., preeclampsia and gestational hypertension). Temporary hypertension of systolic or diastolic blood pressure in pregnancy is closely linked to maternal morbidity and poor fetal outcome. The level of blood glucose is also significant in the diagnosis of gestational diabetes, which can cause the danger of maternal complications and childhood health issues. Body and heart rate can indicate stress or infection physiologically in progress during pregnancy, which can lead to high levels of maternal risk. Another clinically meaningful factor is age since very young and older maternal ages are both known to predispose pregnancy-related risks. Thus, the predictive characteristics used by the proposed model correspond to the known obstetric risk factor, which underlies the clinical interpretability of the machine learning predictions. Altogether, implementing machine learning-based systems of predicting maternal risks into regular prenatal care may positively affect early screening, evidence-based clinical decision-making, and the ability to detect high-risk pregnancies in time. Such systems can enhance the quality of maternal healthcare delivery, especially in low-resource settings, and guarantee that last-line clinical decisions will be under the control of medical personnel by offering valid risk stratification with the help of regularly gathered clinical parameters.
Ethical considerations
The development and use of machine learning models in medicine need to be addressed with great attention to the ethical and responsible AI concepts. The dataset in this study is publicly accessible and anonymized, and no identifiable information of a patient was obtained. Thus, there was no need to have institutional review board approval. Nevertheless, in the implementation of predictive models in clinical practice, ethical concerns are also significant.
To begin with, patient confidentiality and information security should be the priority in incorporating the systems of machine learning into the electronic health records. Second, one should pay good attention to algorithmic bias, especially when the model is being trained on small datasets, which might not necessarily represent the diverse populations. Multi-center clinical data should thus be used to validate it externally to create fairness and generalizability. Third, clinical AI systems are crucial in terms of transparency and interpretability, as health workers need to know how the predictions are made. Last, decision-support tools must be applied as predictive models and not as an autonomous decision-maker so that final medical decisions do not fall under the jurisdiction of qualified healthcare professionals.
The responsibility of the integration of the artificial intelligence technologies into the maternal healthcare systems is to address such ethical considerations.
Limitations
This research is limited in a number of ways. The size of the dataset (n = 1015) is modest and has been obtained from one open source. Generalizability should be confirmed using external validation through multi-center clinical data. Also, structured tabular features were considered only; the longitudinal clinical and demographic variables can also be added, which would enhance predictive performance.
The conclusion
The given study described a leakage-based hybrid ensemble framework of predicting maternal health risks when using optimized XGBoost and stacking-based models. The methodological transparency and unbiased assessment were achieved by a stratified validation split (80/20), application of SMOTENN to the training values only, and hyperparameter tuning of the cross-validation. The hybrid model proposed obtained high discriminative performance (ROC-AUC = 0.911) and balanced class-specific predictive accuracy and clinically significant sensitivity of high-risk maternal cases. Besides, probabilistic calibration evaluation showed credible risk estimates, which justifies the model potentially being relevant to the context of decision support. Instead of focusing on inflated accuracy scores, the present study presents the need to focus on the rigorous validation design, leakage avoidance, and probabilistic analysis in the research of digital health machine learning. The results prove that ensemble-based models may offer effective and clinically significant maternal risk stratification in case they are deployed through reproducible and clear workflows.
Next, future research needs to be done to externally validate with multi-center clinical data and integrate into actual working digital health systems to evaluate clinical utility and implementation.
Ethical considerationsethics-approval
A publicly available and anonymized dataset on the UCI Machine Learning Repository was used in the study. No known personal information was utilized; hence, there was no need to have ethical approval and informed consent.
Footnotes
Acknowledgements
The authors wish to acknowledge that they greatly appreciate the researchers and institutions that publicly released the medical dataset that made this study possible. The authors also wish to extend a thank-you note to the wider research fraternity, the members of which have aided the development of data-driven healthcare. The reviewers and editorial team have been given a special thanks, as their comments and constructive feedback have played a key role in enhancing the quality and clarity of this manuscript.
Author contributions
H.K.S.A. came up with the study, formulated the methodology, analyzed the data and wrote the manuscript.
A.A.I. supervised the research, revision of manuscript and validation of results.
Final manuscript was reviewed and approved by all the authors.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The UCI repository allows access to the dataset. Code is accessible on reasonable request.
Guarantor
This study is guaranteed by the corresponding author, and all authors of this study take full responsibility for the integrity of the data and the accuracy of the analysis.
