Development of an advanced maternal health risk detection system using enhanced XGBOOST and blending models

Abstract

Objective

Mother’s health risks should be identified early so that the outcome of the pregnancy can be enhanced and the complications experienced during pregnancy can be minimized. This paper will design and test a leakage-regulated hybrid machine learning model to predict maternal health risk using the optimized ensemble models.

Methods

It used the publicly available UCI Maternal Health Risk dataset (n = 1015). A fixed random seed was used to stratify the dataset to 80% training and 20% independent testing subsets (42). SMOTENN resampling was only done to the training data to avoid data leakage. Internal cross-validation was resorted to as a means of hyperparameter tuning. We came up with the optimized XGBoost, blending, and hybrid stacking models. The performance of a model was measured in terms of accuracy, precision, recall, F1 score, ROC-AUC, confusion analysis, and probability mean squared error (Brier score).

Results

The hybrid stacking model had a ROC-AUC of 0.911 and general accuracy of 80 percent over the independent test set. The model proved to be very sensitive with high-risk cases (recall = 0.85). The probability mean squared error (Brier score) was 0.07, which is good probability calibration. The hybrid framework proposed performed better in terms of discriminative capability as compared to baseline models (logistic regression, random forest, and SVM).

Conclusions

The suggested leakage-sensitive hybrid ensemble framework offers strong and clinically significant working outcomes on maternal health risk forecasting. The results show the significance of effective validation techniques and probabilistic evaluation measures in healthcare machine learning systems.

Keywords

maternal disease pregnancy risk detection SMOTENN XGBOOST ensemble,learning genetic algorithm hybrid model

Introduction

The healthcare industry has been greatly impacted by the quick development of technology, especially with regard to maternity care. This paper will investigate the use of blending models and modified XGBoost to achieve enhanced maternal health risk identification.¹ These cutting-edge methods offer a priceless new strategy for the early detection of possible problems, giving medical personnel the knowledge they need to improve patient outcomes.² Join us as we explore the thinking behind these innovative models and talk about the consequences of using them in contemporary contexts for maternal healthcare. It is essential to recognize the critical role that maternal health plays in the welfare of both the mother and the child. Pregnancy problems during childbirth can be greatly decreased with early diagnosis of high-risk pregnancies, protecting the mother’s and the child’s lives.³ Machine learning algorithms have demonstrated encouraging outcomes in the past several years when it comes to forecasting various medical applications,⁴ including predicting high-risk pregnancies and improving maternal health outcomes. The goal of this project is to use modified XGBoost and blending models to create an enhanced maternal health risk identification system. Through the use of data preprocessing methods such as binarization, label encoding, and the SMOTE-EN approach, our model seeks to increase prediction accuracy and make it easier to detect possible high-risk maternal cases. The contributions of the study include the following:-

• Data pre-treatment techniques such as label encoding and label binarization to manage categorical variables and enable the use of machine learning algorithms for risk prediction.

• Utilization of SMOTENN for data resampling and addressing unbalanced datasets in maternal health risk assessment.

• Design a feature importance model to identify essential risk variables and detect maternal health hazards effectively.

• Development of an improved XGBoost classification model with hyperparameter tuning to enhance the accuracy of detecting maternal health risks.

• New Blending Ensemble Prediction Model that combines the predictions of multiple base classifiers to improve predictive performance and reliability in detecting maternal health risks.

More and more structured analytical models have been used on large-scale real-world health data to assist in predictive modeling and surveillance of healthcare systems. Indicatively, Nasehi et al..⁵ Have also illustrated how a CRISP-DM-based analytical pipeline can be used in national outpatient prescription surveillance and that such standardized data-driven approaches are important in healthcare analytics. The requirement of transparent and reproducible machine learning architectures in clinical risk prediction settings is demonstrated through such types of structured modeling strategies. We aimed to improve pregnancy care by building an improved maternal health risk detection system that integrated modified XGBoost and blending models with data preprocessing approaches to enable early diagnosis and intervention for high-risk pregnancies. This method has enormous promise for saving countless lives and improving mother and newborn health outcomes. The structure of this paper will start with an introduction to explain our study ideas and follow with section two, which includes the literature studies comparing with this one. The methods and materials of our study will be clear in section three. Furthermore, we will discuss and explain our findings and outcomes in section four, and finally, we conclude and refer to our study in section five.

Related studies

We will highlight some previous studies related to this research. Classification algorithms of maternal risk detection were developed by Muhlis Tahir et al. The goal of this study was to examine the outcomes of two algorithms, deep learning and neural network, to predict the risk of preeclampsia in expectant mothers over the course of their pregnancy. The feature selection approach that we employ is particle swarm optimization (PSO).⁶ PSO may minimize the number of traits from 17 to 9 attributes, as demonstrated by this experiment. Based on the smaller dataset, deep learning produced results with an accuracy of 95.12% and a quicker execution time, according to the original data. In this paper they used some AI algorithms but in normal form, which shows the difference from our study. We developed new novel machine learning algorithms to be in advanced hybrid form by combining different types of algorithms that supported us with better results and higher accuracy related to the previous ones.

Machine learning approaches developed by Akhan Akbulut et al.¹ to predict the fetal health status based on the clinical history of the mother. In this study, they created a prediction system with assistive e-Health applications that practitioners and expectant mothers could utilize. 89.5% was the greatest prediction accuracy recorded in this paper’s development testing using the Decision Forest model. 16 individuals participated in real-world testing, and the performance was 87.5%. This approximation is adequate to provide a sense of the health of the fetus before the patient sees the doctor. In this study they used some machine learning algorithms, and they got lower accuracy and system performance than ours.

Vibhakar Mansotra and Sourabh Shastri developed data mining probabilistic classifiers to extract knowledge from maternal health datasets in Jammu and Kashmir State, India.⁷ The study used the Health Management Information System (HMIS) to categorize districts into high MMR and low MMR, using Bayesian TAN and Naïve Bayes classifiers from 2014 to 2018. Comparing with our models, we designed better AI models than those previously developed, including hybrid advanced techniques.

KS Betts et al. used the methods of prediction of usual postpartum issues in mothers. Comparison of the performance of the models involved five-fold cross-validation by utilizing gradient-boosted trees. The top-performing models across the outcomes in the independent validation data were then considered with the area under the receiver operating curve (AUC-ROC).⁸ Obstetric surgical wound infection (AUC = 0.856, 95% CI 0.8380.873) and postpartum hypertensive disorders had a strong measure of discrimination whereas postpartum sepsis and hemorrhage had poor measures of discrimination in the independent validation data.

The application of machine learning algorithms to predict the fetal risk is suggested by Hoodbhoy et al. This study set out to investigate the accuracy of machine learning algorithm strategies in detecting high-risk pregnancies using CTG data.⁹ The University of California, Irvine Machine Learning Repository provided the CTG data for 2126 pregnant women. CTG data were used to train ten distinct machine learning classification models. With a prediction accuracy of 93%, the classification model created with the XGBoost approach had the highest. By looking through this paper, we saw that they used the XGBoost algorithm without hyperparameter tuning, while we applied advanced XGBoost in an advanced form, and we got the highest accuracy in our model.

Matthew K. Hoffman et al. developed a machine learning readmission prediction model of the mothers.¹⁰ Designing and certifying a machine learning prediction model of postpartum readmission of mothers with complications as a result of hypertensive disorders of pregnancy (HDP). The performance characteristics of the test in terms of derivation cohort AUC=0.85 and the validation cohort AUC=0.81) and demographics were comparable across the two cohorts. It was found that 31 clinical characteristics, used in the derivation and validation methods were very predictive.

A Study of the Pregnancy Risk Assessment Monitoring System in the United States: Data generated by Kenesha Smith Barber et al..¹¹ This study used a sizable, population-based sample of American women to assess the association between postpartum depression symptoms and having a preterm delivery. They observed significant statistical significance in the association between mother hopelessness and having a preterm birth for both very preterm and extremely preterm births (moderate-to-late preterm OR, 1.19).

Using ensemble learning-based feature engineering, Ali Raza et al. propose analyzing maternal health risks.¹² This project aims at developing a system that will apply artificial neural networks to detect maternal health hazards through examination of medical data. DT-BiLTCN is a revolutionary structure of the deep neural network that involves the utilization of the temporal convolutional network, the bidirectional long short-term memory network, and decision trees. Synthetic minority over-sampling technique is used to overcome the problem of class imbalance. The result of the support vector machine in the present case is 98 percent rate of accuracy, whereas DT-BiLTCN provides a feature set to achieve high-accuracy results.

By applying Jean N.'s machine learning models, the health risks associated with pregnancy are predicted.¹³ The four stages of their applied approach include modeling, comparative analysis, hyperparameter tweaking, and data processing. The following machine learning models—LR, KNN, SVM, ANN, CART, RF, GBM, XGB, Light GBM, and CatBoost—were used to forecast potential health hazards associated with pregnancy. With 88% accuracy, the LightGBM and CatBoost algorithms produced the best prediction value. RF algorithms had an accuracy of 86% and CART 87%, respectively.

According to the study Prediction of Maternal Health Risk with Traditional Machine Learning Methods by Hursit Burak MUTLU et al.,³ machine learning techniques can be useful in assessing the health risks associated with mothers. Consequently, the maternal health risk in this study was determined using six distinct machine learning techniques. Researchers compared the results of these approaches and determined that the decision tree approach was the most effective in predicting maternal health risk. 89.16% was the accuracy number attained using the decision tree approach.

Sultana Akter and Md Nurul Raihen¹⁴ propose a comparative assessment of numerous effective machine learning classifications to determine maternal health risks. The given machine learning algorithms that play an important role in the assessment of maternity health risks, including LDA, QDA, KNN, decision tree, random forest, and bagging as well as support vector machine. They made use of 214 observations in testing and 800 in training during the split validation procedure implementation. The most reliable model was also determined using a cross-validation method that was carried out 10 times. The support vector machine using 10-fold cross-validation procedure has an accuracy score of 86.13% which makes the proposed model more accurate and efficient in comparison with all other models.

The article by Chaturvedi, Ramesht, et al. discusses the use of kernels for support vector machinesing for maternal health risk analysis.¹⁵ Additionally, machine learning works effectively in the medical field. Through the delivery of medical information, machine learning enables us to comprehend the circumstances of patients. It also aids in the classification of illnesses, the detection of abnormalities, the prediction of future outcomes for disease treatments, and other tasks. They used our support vector machine model for risk analysis related to maternal health in this article. This model’s testing accuracy is 72.906 percent.

According to the previous studies that we presented above, we saw that most of these studies implemented their research in normal form, and they got good results, but in this study we presented the best models to deal with diagnosis and detection problems in medical fields, especially with maternal risk and imbalanced datasets. We had got high accuracy related to our advanced models that we used, including better preprocessing strategies, including SMOTEENN, which consists of two resampling methods working together to solve unbalanced samples, and used feature importance as a threshold to choose the best subset of features. In addition, we implemented advanced XGBoost and blending models to get better results than the previous one, and to increase the system performance, we used a hybrid of XGBoost and Adaboost with a genetic algorithm that was used to choose the better parameter for the hybrid model.

Study methods

In order to evaluate maternal health risks and create prognostic or diagnostic models utilizing cutting-edge machine learning techniques, this medical investigation employs a retrospective cohort study design. Researchers can assess outcomes like mortality or quality of life by analyzing data from historical medical records using a retrospective method.

Model evaluation strategy

The evaluation of the model was done by utilizing a stratified 80/20 train/test split with a random fixed seed (42). The 5-fold stratified cross-validation was employed as the hyperparameter optimization on the training set. Estimation of confidence intervals of ROC–AUC was done through repeated cross-validation resampling. Logistic regression as well as random forest and support vector machine are examples of baseline classifiers that were trained and tested under the same preprocessing and validation conditions to promote equal comparisons.

Data collection

In this study, the publicly available Maternal Health Risk dataset was used in the UCI machine learning repository (Ahmed et al.). The data is anonymized and publicly available to conduct research. No data in the form of hospital or proprietary type was utilized. Thus, there was no need for institutional review board approval and informed consent.¹⁶ This dataset includes 7 columns with 1015 instances. The dataset includes information on seven feature variables: blood glucose level, body temperature (in degrees Fahrenheit), heart rate, estimated risk intensity during pregnancy for pregnant women (n=1014), age, systolic and diastolic blood pressure, and body mass index. The initial data had three outcome categories (low, mid, and high maternal risk). In this case, the problem was converted into a binary classification using the merging of the mid-risk category and the high-risk category. This choice was arrived at so as to lay more emphasis on detection of high-risk pregnancies whose clinical outcomes have been linked to adverse fetal and maternal outcomes. The binary labels, as a result, were coded 1 (low risk) and 0 (mid/high risk). Table 1 displays the dataset’s characteristics.

Table 1.

The dataset attributes information.

Attributes name	Attributes info
Age	The age of a pregnant lady expressed in years.
SystolicBP	SystolicBP, or upper blood pressure measured in millimeters of mercury, is another important factor to consider during pregnancy.
DiastolicBP	Another important factor to consider during pregnancy is a lower blood pressure reading in millimeters.
BS	Blood glucose levels are expressed in millimoles per liter, or mmol/L.
Heart Rate	The heart rate at rest, measured in beats per minute.
Risk Level	Based on the preceding characteristic, the anticipated level of risk intensity during pregnancy.

Preprocessing workflow and leakage control strategy

We gathered different formats and attributes of raw data from various sources. We assigned a distinct integer value to each category to carry out data labeling using label encoding for categorical characteristics. A fixed random seed was used to first stratify the dataset into 80% training and 20% independent testing subsets (42). SMOTE resampling was implemented on the training set only after the division to avoid data leakage. The preprocessing and model development did not have any interference with the test set. Internal cross-validation, which was based solely on the training data, was used to select features and tune hyperparameters. This is a leakage-controlled workflow that has been used to guarantee unbiased performance assessment. Based on a threshold value, binarization was used to transform continuous variables into binary values.¹⁷ Due to the unbalanced nature of maternal health data, we used the Synthetic Minority Over-sampling Technique for Editing Noisy Instances (SMOTENN) to rebalance our dataset.¹⁸ The approach has been created by Batista et al. (2004) and it involves the utilization of SMOTE to generate artificial instances of the minority group and ENN to eliminate some of the observations of both groups that are identified to belong to a different category than the K-nearest neighbor majority category.¹⁹. The resulting balanced dataset is produced by oversampling instances of minority classes and undersampling instances of majority classes using this procedure.

Figure 1 shows the dataset outcome label before preprocessing and the use of the SMOTEENN technique, and in Figure 2 we presented the results of the label after the use of this technique. The distribution of our outcome label according to the counter (‘high risk’: 486, ‘low risk’: 325) before the resampling process and after the resampling and cleaning process was (‘high risk’: 486, ‘low risk’: 486).

Figure 1.

The predicted label distribution before SMOTENN.

Figure 2.

The predicted label distribution after SMOTENN.

In addition, we used feature importance as a threshold to improve our system performance and to get a better subset of features in the data.²⁰ In Information Gain (IG), the value used as a reference for the selected feature is called the threshold, often referred to as the cut-off or threshold. Either 0.05 is used as the threshold value or it is chosen independently. To get the final feature’s threshold value, Tsai and Sung computed and averaged each frequency. Tsai’s concept makes it possible to use standard deviation to determine the threshold value.²¹ Table 2 shows the distribution of our prediction labels before and after; also, Table 3 presents the results of the thresholds method with a total accuracy of model 86%.

Table 2.

The class distribution before and after SMOTENN.

Dataset split	Low risk (class 0)	High risk (class 1)	Total samples
Training (Before SMOTE)	486	325	811
Training (After SMOTE)	486	486	972
Test Set (Unmodified)	122	81	203

Table 3.

The results of thresholds method.

Attribute no. and threshold	Feature accuracy
Thresh=0.078, Attr.n=6	88.67%
Thresh=0.089, Attr.n=5	91.63%
Thresh=0.099, Attr.n=4	86.70%
Thresh=0.149, Attr.n=3	87.19%
Thresh=0.200, Attr.n=2	75.86%
Thresh=0.385, Attr.n=1	73.89%

The blending model

An ensemble learning subfield of machine learning combines multiple models of identical or different kinds to enhance the overall effectiveness of the model. One of the many ensemble machine-learning techniques is blending, which applies a machine-learning model to decide the best way to combine predictions of a number of ensemble member models,²² and which is similar to a stacked model in some respects. Most ensemble learning models and blending models are constructed based on a meta-model or level-1 model, which is a combination of the predictions of the base models with two or more base models, also known as level-0 models.²³ The meta-model or the main model is trained using predictions of the base models on the non-sample data. Our model is an advanced form of blending that was created on the basis of the random forest, k-Nearest Neighbors classifier, and logistic regression algorithms to enhance the model efficiency.²⁴ Figure 3 showed the form of the blending model.

Figure 3.

The structure of the blending model.

Genetic algorithm (GA)

Most evolutionary algorithms consist of adaptive heuristic search algorithms, or genetic algorithms (GAs). Genetic algorithms are based on natural selection and genetics.²⁵ These are brilliant applications of the sporadic searches that are assisted by historical data to narrow the search to those parts of the solution space where there are greater performances. They are regularly utilized to offer superior solutions of search and optimization problems. The GAS works especially well in complicated search areas where more conventional optimization techniques might not be able to perform well.²⁶ A genetic algorithm may be used to optimize XGBoost model parameters to determine the best collection of hyperparameters for maximizing model performance. We used this technique with our hybrid model of XGBoost and Adaboost models to increase the model performances. Figure 4 describes the structure of a genetic algorithm.

Figure 4.

The structure of a genetic algorithm (GA).²⁷

Hybrid advanced model

Boosting is the process of combining a group of weak learners to create a strong learner. XGBoost will repeatedly produce a collection of weak models on sections of the data, with each weak prediction being weighted based on how well the weak learner did. We provided a hybrid model in advanced form, including XGBoost and Adaboost algorithms, with the use of grid search and a genetic algorithm to set the best subset of parameters.¹⁸ To provide a more reliable and task-specific ensemble model, the hybrid approach combines these approaches to maximize the benefits of XGBoost, AdaBoost, and genetic algorithms. To ensure a hybrid approach’s success and prevent overfitting, it is necessary to meticulously develop and evaluate the method.²⁴ However, as these methods can be computationally demanding, particularly when used in a hybrid framework, consideration should also be given to temporal complexity and computational resources.

The results

In order to obtain a good estimation of the performance, the model was optimized on the training data by 5-fold stratified cross-validation, and the final evaluation was done on another independent control set (20%). Accuracy, precision, recall, F1-score, and ROC-AUC were used to measure model performance, along with the Brier score to measure probe calibration. Cross-validation resampling was used to estimate the confidence interval of ROC-AUC to offer a more significant evaluation of predictive performance among models. In addition, for our results, we had split the data into 80% train and 20% test. We used XGBoost in advanced form, and at each boosting round, the error rate (misclassification rate) on the validation set is represented by validation_0-error. Improved performance is indicated by lower values. Since the 12th iteration (boosting round) achieved the lowest validation error, the training was terminated at that point. The validation error at this iteration was 0.018939, or around 1.89%. On the validation set, the model’s overall accuracy is 98.11%, or 0.98111. Both the weighted-average and macro-average F1 scores are 0.98, which indicates outstanding overall performance in all courses. To summarize, the XGBoost model demonstrated good classification capacity on the validation set, achieving high accuracy and performing well in terms of precision, recall, and F1-score for both classes. Table 4 presented the results as classification reports for the XGBoost model, and Figure 5 presented the confusion matrix of this model.

Table 4.

The results of XGBoost model.

Class name	Precision	Recall	F1-score	Support
High risk class (0)	0.93	0.82	0.87	122
Low risk class (1)	0.77	0.90	0.83	81
Avg.	0.86	0.85	0.85	203

Figure 5.

The confusion matrix of (XGBoost) results.

In Table 5 and Figure 6, the confusion matrix numeric values show that the hybrid model rightly identified 141 low-risk and 104 high-risk cases. One hundred and eight high-risk pregnancies were wrongly regarded as low risk, whereas 42 low-risk pregnancies were predicted to be high-risk. The model is highly sensitive (recall = 0.85), and the reason is the relatively low false-negative rate of the model in the high-risk group (18 cases): this parameter is of clinical importance because it enables the timely identification of maternal risks. In addition, we used the advanced blending model, and the accuracy results were 85.22%. The information we have supplied comes from a blending model, which generally combines the predictions of several other models, such as KNeighborsClassifier, RandomForestClassifier, and Logistic Regression, to provide a final forecast. Effective classification capabilities throughout the dataset were demonstrated by the blending model, which attained extremely high accuracy and performed remarkably well in terms of precision, recall, and F1-score for both classes. Table 6 explains the results of the blending model, and Figure 7 shows the ROC curve accuracy of the blending model.

Table 5.

Confusion matrix table of hybrid model.

	Predicted low	Predicted high
Actual Low	141	42
Actual High	18	104

Figure 6.

The confusion matrix of hybrid model.

Table 6.

The results of advance blending model.

Class name	Precision	Recall	F1-score	Support
High risk class (0)	0.90	0.93	0.91	122
Low risk class (1)	0.88	0.84	0.86	81
Avg.	0.89	0.89	0.89	203

Figure 7.

The ROC-Curve of blending model.

Moreover, we used a hybrid model in an advanced form, including XGBoost and Adaboost, with an applied genetic algorithm GA to choose the best parameter of the hybrid model with Bayesian grid search. The results used Prob-MSE (Brier) were 0.07 and the ROC-AUC 91%, which are presented in Figure 8. Also, Table 7 presented the comparison results of our models.

Figure 8.

The Prob-MSE (Brier) of the hybrid model.

Table 7.

The comparison results of our models.

Model	Accuracy	ROC-AUC
Logistic Regression	0.74	0.83
Random Forest	0.78	0.88
SVM	0.76	0.86
XGBoost	0.82	0.90
Blending	0.89	0.88
Hybrid Model	0.80	0.911

The tuned XGBoost model using Bayesian hyperparameter optimization has chosen the following settings: max tree depth = 8, learning rate = 0.078, subsample ratio = 0.737, column subsampling rate = 0.865, L1 regularization (0.96) and L2 regularization (7.54), 255 boosting estimators, child minimum weight = 6, and gamma = 0.50.

As part of the suggested stacking framework and the neural network model, the hybrid ensemble showed excellent performance in terms of discrimination. The hybrid model on the independent test set (n = 305) gave an ROC AUC of 0.911, which represents excellent separation of low- and high-risk maternal cases. The total classification accuracy was 80%, and the F1-score (macro-average) was 0.80, which represents a class-balanced predictive performance.

Evaluation based on classes revealed that the model had a precision of 0.89 and recall of 0.77 with the low-risk group and 0.71 and recall of 0.85 with the high-risk group. Interestingly, the increased recall of the high-risk category shows that it is sensitive to detect at-risk maternal cases, which is clinically desirable in the screening application.

These findings indicate that the stacking-based hybrid framework gives a strong and clinically significant predictive accuracy to maternal health risk prediction.

Discussion

In the current research, a hybrid ensemble framework with leakage control was created and assessed as a predictive model of maternal health risks based on optimized strategies of XGBoost and stacking. The suggested strategy had good discriminative performance on an independent test set (ROC-AUC = 0.911) with equal predictive performance by classes as well as clinical significance of high-risk cases (recall = 0.85). Such results show that well-validated ensemble learning models may be helpful in terms of maternal risk stratification based on the structured healthcare data.

The optimized XGBoost model (ROC–AUC = 0.90) had a better discriminative capacity as opposed to baseline models (random forest, SVM, and logistic regression), but the hybrid stacking model had a stronger overall predictive strength. The progress of the stacking framework implies that when the heterogeneous learners are aggregated, the meta-model is able to combine the complementary predictive patterns that are enshrined by the gradient boosting and neural network representations. Significantly, despite a total accuracy of 80, the model showed a greater recall in high-risk pregnancies, which is especially laudable in screening plans where reduction of false negativity is of clinical importance. Good calibration of each predicted probability is suggested by the probability mean squared error (Brier score = 0.07). This implies that the model is not only discriminative in categories of risks but also has effective probabilistic estimates, which are fundamental in clinical decision-support systems. In the case of digital health applications, discrimination, as well as calibration, is required to guarantee reliable implementation. This research methodologically focuses on the need to avoid data leakage in machine learning research in healthcare. We have provided unbiased evaluation through the performance of stratified splitting before SMOTENN resampling and hyperparameter tuning in the training data only. The high accuracy values reported in the previous literature usually do not clearly specify the leakage control strategies. This issue is taken into consideration in the revised pipeline used in the study and increases reproducibility.

In comparison to the prior maternal health risk prediction studies, which have reported accuracies of 86 to 91 percent, our findings reveal similar or better discriminative findings whilst being methodologically transparent. This work uses validation rigor and clinical interpretability instead of the inflated values of accuracy. There has been an accelerating use of structured analytical frameworks on large-scale real-world health data to aid predictive modeling and surveillance of a healthcare system. As an illustration, Nasehi et al. proved the usefulness of national healthcare analytics based on standardized CRISP-DM-based pipelines. Equally, population-level predictive analysis based on health insurance data has been used in the medication safety and pharmacoepidemiological surveillance setting. These works highlight the greater translational nature of frameworks of structured machine learning in digital health ecosystems. All in all, the results suggest the possibility of using ensemble-based maternal risk prediction systems and emphasize the necessity of integrating discrimination measures, calibration evaluation, and leakage-controlled validation methods in clinical machine learning studies.

Along with the prediction of maternal health risks, structured machine learning structures have been effectively used in population-level health analytics. The recent findings based on the real-life data about health insurance proved to be useful in medication safety assessment and pharmacoepidemiological monitoring.²⁸ This result highlights the larger translational capabilities of standardized predictive modeling pipelines within digital health ecosystems. The present research is part of this paradigm because it applies a leakage-sensitive ensemble model that complies with reproducible standards of analysis.²⁹

The predictive variables in the model are clinically relevant in monitoring the health of the mother. Blood pressure parameters, such as systolic and diastolic blood pressure, are already known predictors of pregnancy complications (e.g., preeclampsia and gestational hypertension). Temporary hypertension of systolic or diastolic blood pressure in pregnancy is closely linked to maternal morbidity and poor fetal outcome. The level of blood glucose is also significant in the diagnosis of gestational diabetes, which can cause the danger of maternal complications and childhood health issues. Body and heart rate can indicate stress or infection physiologically in progress during pregnancy, which can lead to high levels of maternal risk. Another clinically meaningful factor is age since very young and older maternal ages are both known to predispose pregnancy-related risks. Thus, the predictive characteristics used by the proposed model correspond to the known obstetric risk factor, which underlies the clinical interpretability of the machine learning predictions. Altogether, implementing machine learning-based systems of predicting maternal risks into regular prenatal care may positively affect early screening, evidence-based clinical decision-making, and the ability to detect high-risk pregnancies in time. Such systems can enhance the quality of maternal healthcare delivery, especially in low-resource settings, and guarantee that last-line clinical decisions will be under the control of medical personnel by offering valid risk stratification with the help of regularly gathered clinical parameters.

Ethical considerations

The development and use of machine learning models in medicine need to be addressed with great attention to the ethical and responsible AI concepts. The dataset in this study is publicly accessible and anonymized, and no identifiable information of a patient was obtained. Thus, there was no need to have institutional review board approval. Nevertheless, in the implementation of predictive models in clinical practice, ethical concerns are also significant.

To begin with, patient confidentiality and information security should be the priority in incorporating the systems of machine learning into the electronic health records. Second, one should pay good attention to algorithmic bias, especially when the model is being trained on small datasets, which might not necessarily represent the diverse populations. Multi-center clinical data should thus be used to validate it externally to create fairness and generalizability. Third, clinical AI systems are crucial in terms of transparency and interpretability, as health workers need to know how the predictions are made. Last, decision-support tools must be applied as predictive models and not as an autonomous decision-maker so that final medical decisions do not fall under the jurisdiction of qualified healthcare professionals.

The responsibility of the integration of the artificial intelligence technologies into the maternal healthcare systems is to address such ethical considerations.

Limitations

This research is limited in a number of ways. The size of the dataset (n = 1015) is modest and has been obtained from one open source. Generalizability should be confirmed using external validation through multi-center clinical data. Also, structured tabular features were considered only; the longitudinal clinical and demographic variables can also be added, which would enhance predictive performance.

The conclusion

The given study described a leakage-based hybrid ensemble framework of predicting maternal health risks when using optimized XGBoost and stacking-based models. The methodological transparency and unbiased assessment were achieved by a stratified validation split (80/20), application of SMOTENN to the training values only, and hyperparameter tuning of the cross-validation. The hybrid model proposed obtained high discriminative performance (ROC-AUC = 0.911) and balanced class-specific predictive accuracy and clinically significant sensitivity of high-risk maternal cases. Besides, probabilistic calibration evaluation showed credible risk estimates, which justifies the model potentially being relevant to the context of decision support. Instead of focusing on inflated accuracy scores, the present study presents the need to focus on the rigorous validation design, leakage avoidance, and probabilistic analysis in the research of digital health machine learning. The results prove that ensemble-based models may offer effective and clinically significant maternal risk stratification in case they are deployed through reproducible and clear workflows.

Next, future research needs to be done to externally validate with multi-center clinical data and integrate into actual working digital health systems to evaluate clinical utility and implementation.

Ethical considerationsethics-approval

A publicly available and anonymized dataset on the UCI Machine Learning Repository was used in the study. No known personal information was utilized; hence, there was no need to have ethical approval and informed consent.

Footnotes

Acknowledgements

The authors wish to acknowledge that they greatly appreciate the researchers and institutions that publicly released the medical dataset that made this study possible. The authors also wish to extend a thank-you note to the wider research fraternity, the members of which have aided the development of data-driven healthcare. The reviewers and editorial team have been given a special thanks, as their comments and constructive feedback have played a key role in enhancing the quality and clarity of this manuscript.

ORCID iD

Hatem Kareem Saleem Altaie

Author contributions

H.K.S.A. came up with the study, formulated the methodology, analyzed the data and wrote the manuscript.

A.A.I. supervised the research, revision of manuscript and validation of results.

Final manuscript was reviewed and approved by all the authors.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The UCI repository allows access to the dataset. Code is accessible on reasonable request.*

Guarantor

This study is guaranteed by the corresponding author, and all authors of this study take full responsibility for the integrity of the data and the accuracy of the analysis.

References

Akbulut

Ertugrul

Topcu

. Fetal health status prediction based on maternal clinical history using machine learning techniques. Comput Methods Programs Biomed 2018; 163: 87–100. https://doi.org/10.1016/j.cmpb.2018.06.010

Saqib

Khan

Butt

. Machine learning methods for predicting postpartum depression: scoping review. JMIR Ment Health 2021; 8: e29838. https://doi.org/10.2196/29838

Mutlu

Yücel

Durmaz

, et al. Prediction of maternal health risk with traditional machine learning methods. Nat Eng Sci 2023; 4: 16–23. https://doi.org/10.46572/naturengs.1293185

Lins

AJCC

Muniz

MTC

Garcia

ANM

, et al. Using artificial neural networks to select parameters for prognosis of mild cognitive impairment and dementia in elderly individuals. Comput Methods Programs Biomed 2017; 152: 93–104. https://doi.org/10.1016/j.cmpb.2017.09.013

Nasehi

Effatpanah

Gholamnezhad

, et al. Antibiotic prescription prevalence in Iranian outpatients: a focus on defined daily doses and the AWaRe classification system. Am J Infect Control 2024; 52: 1359–1365. https://doi.org/10.1016/j.ajic.2024.07.007

Tahir

Badriyah

Syarif

. Classification algorithms of maternal risk detection for preeclampsia with hypertension during pregnancy using particle swarm optimization. Emit Int J Eng Technol 2018; 6: 236–253. https://doi.org/10.24003/emitter.v6i2.287

Shastri

Mansotra

. Data mining probabilistic classifiers for extracting knowledge from maternal health datasets. Int J Innov Technol Explor Eng 2019; 9: 2769–2776. https://doi.org/10.35940/ijitee.b6633.129219

Betts

Kisely

Alati

. Predicting common maternal postpartum complications: leveraging health administrative data and machine learning. BJOG 2019; 126: 702–709. https://doi.org/10.1111/1471-0528.15607

Hoodbhoy

Mohammad

Noman

, et al. Use of machine learning algorithms for prediction of fetal risk using cardiotocographic data. Int J Appl Basic Med Res 2019; 9: 193–195. https://doi.org/10.4103/ijabmr.IJABMR

10.

Hoffman

Roberts

. A machine learning algorithm for predicting maternal readmission for hypertensive disorders of pregnancy. Am J Obstet Gynecol MFM 2021; 3: 100250. https://doi.org/10.1016/j.ajogmf.2020.100250

11.

Barber

Brunner Huber

Portwood

, et al. The association between having a preterm birth and later maternal mental health: analysis of U.S. pregnancy risk assessment monitoring system data. Womens Health Issues 2021; 31: 49–56. https://doi.org/10.1016/j.whi.2020.08.007

12.

Raza

Siddiqui

HUR

Munir

, et al. Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction. PLoS One 2022; 17: e0276525. https://doi.org/10.1371/journal.pone.0276525

13.

Al Mashrafi

Tafakori

Abdollahian

. Predicting maternal risk level using machine learning models. BMC Pregnancy and Childbirth 2024; 24: Available at: https://link.springer.com/article/10.1186/s12884-024-07030-9

14.

Raihen

Akter

. Comparative assessment of several effective machine learning classification methods for maternal health risk. Comput J Math Stat Sci 2024; 3: 161–176. https://doi.org/10.21608/cjmss.2024.259490.1036

15.

Chaturvedi

Balmiki

Jaiswal

, et al. Maternal health risk analysis using support vector machine kernels in machine learning. Grenze Int J Eng Technol 2024; 10: 1953.

16.

Ahmed

. Maternal health risk dataset. UCI Machine Learning Repository. 2020; https://doi.org/10.24432/C5DP5D

17.

Great Learning Team . What is label encoding in Python. Great Learning. Available at: https://www.mygreatlearning.com/blog/label-encoding-in-python/ (accessed 19 March 2024).

18.

Fayez

Kurnaz

. Novel method for diagnosis diseases using advanced high-performance machine learning system. Appl Nanosci 2023; 13: 1787. https://doi.org/10.1007/s13204-021-01990-6

19.

Viadinugroho

RAA

. Imbalanced classification in Python: SMOTE-ENN method. Towards Data Science. Available at: https://towardsdatascience.com/imbalanced-classification-in-python-smote-enn-method-db5db06b8d50 (accessed 19 March 2024).

20.

Fayez

. Contributing to diagnoses of mental disease using new optimization machine learning methods. International Journal of Advanced Computer Science and Applications 2021; 12: 809–815.

21.

Tsai

Sung

. Ensemble feature selection in high dimension, low sample size datasets: parallel and serial combination approaches. Knowl Based Syst 2020; 203: 106097. https://doi.org/10.1016/j.knosys.2020.106097

22.

Gupta

. Blending in machine learning. Scaler Topics. Available at: https://www.scaler.com/topics/machine-learning/blending-in-machine-learning/ (accessed 20 March 2024).

23.

Hansrajh

Adeliyi

Wing

. Detection of online fake news using blending ensemble learning. In: Proceedings of the International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), IEEE, 2021.

24.

Fayez

Kurnaz

Ata

. New advanced optimization models for diagnoses of diseases with imbalanced datasets. Journal of Engineering Science and Technology 2022; 37: 308–323.

25.

Genetic algorithms . GeeksforGeeks. Available at: https://www.geeksforgeeks.org/genetic-algorithms/ (accessed 20 March 2024).

26.

Fayez

Ata

. Application area of classification techniques in medicine. International Journal of Scientific and Technology Research 2018; 2: 1–8.

27.

Mehdary

Chehri

Jakimi

, et al. Hyperparameter optimization with genetic algorithms and XGBoost: a step forward in smart grid fraud detection. Sensors 2024; 24: 1230. https://doi.org/10.3390/s24041230

28.

Gholamnezhad

Nasehi

Alipour

, et al. Addressing medication safety in the elderly: evaluating potentially inappropriate medications using the Beers Criteria 2023 through real-world health insurance data. J Gerontol Geriatr 2024; 72: 259–269.

29.

Karami

Nasehi

Gholamnezhad

, et al. Uncovering antidepressant prescription patterns: insights from a nationwide study in Iran. BMC Psychiatry 2025; 25: 21.