Abstract
Introduction
Obesity is a sophisticated and multi-factorial disease that has obtained an upward trend worldwide in the last two decades. 1 It has a prevalence of approximately 36% among people over 20 years old in the USA. 2 The World Health Organization (WHO) has defined obesity as an excessive accumulation of fat in the body, endangering people’s health status. 3 Although the prevalence of obesity is higher in developed countries due to better social and economic conditions, developing countries have experienced an increasing trend due to the emergence of the western lifestyle in recent decades. 4 Severe obesity poses significant public health challenges due to increased mortality and complications, such as cardiovascular disorders, cancer, and diabetes.5,6 Bariatric surgery is considered the most effective treatment to fight obesity and is preferable to some non-interventional treatments. 7 Some alternative treatments of bariatric surgery include malabsorptive or restrictive separately, Roux-en-Y gastric bypass as a combined treatment, and gastric anastomosis mini-gastric bypass as a combination of malabsorptive and restrictive. 8 This surgery resolves some of the comorbidities associated with obesity by decreasing the gastric pouch to reduce the calorie intake, favorable hormonal changes to reduce appetite, and eliminating type 2 diabetes by changing insulin production.9,10 Even though the patient’s quality of life is improved, various long-term and short-term complications have occurred in approximately 20% of patients who have undergone this surgery. 11
Some complications of bariatric surgery include tachycardia, fistula, bleeding, peritonitis, hernia, gastric erosion, anastomotic stenosis, small bowel obstruction, deep vein thrombosis, pulmonary embolism, pneumonia, malnutrition, liver and biliary disorders, and less often mortality. 12 Early prediction of these complications is essential in choosing more appropriate treatment strategies to improve patients’ quality of life and reduce re-visits to medical centers.13,14 Machine learning (ML) approaches have gained popularity in prediction purposes in many domains, such as healthcare.15,16 So far, previous studies have used prediction models in various medical fields, such as heart diseases, cancer, etc.17–19 One category of ML technique is deep learning (DL), which uses artificial neural networks (ANN) to acquire associations between features and discover unknown patterns in sophisticated data such as images.20,21 Although the ML algorithms can give us insight into the optimal predictive performance based on the structured dataset, the DL can perform this task by using unstructured data such as videos, images, sounds, etc. 22 So, based on the data leveraged for analysis, each of them can be insightful for prediction purposes. Although several studies have been carried out on leveraging ML and DL approaches for predicting complications of bariatric surgery, no systematic literature review or meta-analysis was conducted to discuss these approaches’ ability to predict the complications. The novelty of the current study is in-depth analysis of the predictive performance efficiency of ML algorithms regarding the complications of bariatric surgery, especially based on the data types used. As the first study conducted on this topic, the results obtained by comparing ML-trained algorithms would give us better insight into leveraging the best algorithms for complications of bariatric surgery based on the data used, for example, at the level of national registries or one-single-center databases. This subject can be essential in decreasing complications and increasing the quality of life among individuals who underwent bariatric surgery. Therefore, this study aims to systematically review articles that leveraged ML techniques to predict complications of bariatric surgery and suggest the best solutions based on the knowledge gained from the articles.
Related research
The previous systematic reviews on a similar topic.
Based on Table 2, two out of three systematic reviews have focused on the ML approach to predict abdominal complications. Another study was carried out on gastrointestinal surgery in general. No systematic review has investigated the role of ML in complications of bariatric surgery in terms of predictive performance efficiency.
Methods
Data sources and search strategy
The search strategy of the current review.
Inclusion and exclusion criteria
The inclusion criteria of the current study were the papers written in English up to August 2023, with full text available, academic journals, original papers, and international conferences associated with the prediction of complications of bariatric surgery using ML algorithms. On the contrary, review papers, case reports, case studies, books, e-books, thesis and dissertation, letters to the editor, symposiums, posters, guidelines, and topics on readmission and quality of life after bariatric surgery were excluded from this review.
Study selection
The selection of papers for inclusion in this review was carried out in three steps. First, all articles obtained by scientific databases were investigated by the author regarding duplication. Second, in collaboration with one medical informatics specialist, the articles were independently screened regarding title and abstract based on the inclusion and exclusion criteria. Third, the full text was investigated to identify the eligibility of the screened articles to be leveraged in this review. Any discrepancy in the screening results was referred to another medical informatics specialist for the final agreement.
Data extraction
The relevant data were extracted from the chosen articles based on the data extraction form (Supplementary C), which included two main sections: 1- General information, including the authors’ name, place of publication, and publication date. Two- Specific information, including the population, study type, databases used, algorithms used, method of dealing with imbalanced data, and the prediction performance results. The author and one medical informatics specialist carried out the data extraction independently. In the case that a disagreement arose between two individuals, another medical informatics specialist intervened and resolved the discrepancy.
Due to the considerable heterogeneity (sample size and outcomes) observed between studies on this topic, we couldn’t leverage the meta-analysis for this systematic review. Instead, we used a narrative synthesis of quantitative data on the performance of the included study’s algorithms to analyze the data. To ensure the validity of this synthesis, the author and two medical informatics specialists cross-checked and discussed the results of the current review.
Population
Studies containing patients who had bariatric surgery and physical and mental issues associated with surgery were included in the current research. Studies that included participants who had undergone abdominal surgery for other treatment purposes and studies without any complications were excluded from the review.
Intervention
The studies that developed prediction models using ML algorithms were included in the current research. Any studies that utilized other predictive solutions, such as conventional statistical methods, were excluded. In this regard, studies that used prediction models with or without external validation were included in the current review.26–30
Outcome
The performance efficiency of ML algorithms in other studies was measured based on the area under the receiver operator characteristics curve (AUC) to predict complications of bariatric surgery. AUC is a favorable indicator for measuring and comparing the predictive and diagnostic performance efficiency in various fields, such as medicine. 31
Risk of bias assessment
The risk of bias (RoB) of the studies included in the current review was assessed using the Predictive Risk of Bias Assessment Tool (PROBAST) model (Supplementary B). 27
Results
Characteristics of the included studies
As shown in Figure 1, we initially obtained 174 articles by searching the scientific databases. After removing duplicate articles, 49 articles were excluded from the study. By investigating the titles of papers, 83 articles were excluded from the study, so 42 articles remained. In the next step, by screening the abstracts of the remaining articles and excluding the irrelevant ones (n = 30), 12 of them became eligible. Finally, seven papers were included in this review, excluding the articles not available in full text or unrelated to this topic (n = 5). Table 3 shows the results of the data extraction from the articles included. PRISMA Flow diagram of selected relevant articles. The data extracted from articles. One- Logistic regression, 2- Linear discriminant analysis,3- Quadratic discriminant analysis, 4- Decision tree, 5- k-nearest neighbor, 6- support vector machine, 7- Multi-layered perceptron, 8- Deep neural network, 9- Deep Belief Networks, 10- multivariable logistic regression, 11- convolutional neural network, 12- recurrent neural network, 13- Random Forest.
As shown in Table 3, we found that the articles published on this subject ranged from 2017 to 2020. Figure 2 depicts the frequency of the included articles published. In this regard, the articles were published in 2017 (n = 1), 2018 (n = 1), 2019 (n = 2), and 2020 (n = 3), and these articles show an increasing trend of studies conducted on this topic in recent years. Considering the location of the studies, we gained insight into studies conducted in the USA and Sweden. The distribution of published articles included in this review.
The included studies used structured clinical databases, including a one-single-center database or national registry.
Risk of bias assessment
The results of the RoB assessment and applicability concern in studies are presented in Figures 3 and 4, respectively. Prediction Model Risk of Bias Assessment Tool (PROBAST) of the included studies: risk of bias assessment. Prediction Model Risk of Bias Assessment Tool (PROBAST) of the included studies: the applicability concern.

Figure 3 shows the risk of bias assessment by PROBAST. Based on Figure 3, three studies had a low-risk bias, three with unclear bias, and one with a high risk of bias. The high risk of bias in the study was due to using an univariable selection of predictors. In one study, the risk of analysis was unclear due to a lack of information on multivariable analysis. The outcomes and participants in all included studies had low risk. The predictors were not defined and analyzed for all participants in three studies. Figure 4 indicates the applicability concern of the included studies. The concern about the applicability of the six studies was low due to the included participants and setting, definition, assessment, or timing of predictors in the model, outcome, and definition of outcome that matched the review’s objectives.
Algorithms used and performance evaluation
Some studies reported average predictive results for complications of bariatric surgery, and others reported predictive performance for each complication separately. The algorithms used in previous studies are depicted in Figure 5. The algorithms used for predicting the complications.
As shown in Figure 5, we observed that the ANN (n = 2), MLP (n = 2), RF (n = 2), LR (n = 2), and MLR (n = 2) were used more frequently than other ML algorithms. Different algorithms, including ensemble and non-ensemble, were also used in the included studies. Figure 6 shows the average predictive power of ML algorithms for complications or predictive performance for each complication separately in each study. ML algorithms have different predictive performances based on the database types used. Some studies didn’t report the AUC of each complication separately and just reported the average ROC for complications. So, we reported the mean AUC for these studies. We narrated the results of studies on this topic using complication types and databases. The AUC of best-performing ML algorithms for complications.
The ML algorithms with an AUC ranging from 0.53 to 0.58 didn’t obtain satisfactory predictive performance, indicating the low ML algorithms’ generalizability obtained by studies.28,31,34 Comparing the studies that reported the average performance of complications showed that the RF, Ada-boost, and bagging algorithms with an AUC of 0.91 had more performance efficiency than other algorithms at the national level. Indeed, despite using the SMOTE technique in Razzaghi’s study, the AUC of 0.91 was more favorable than Nudel’s research, with an AUC of 0.64 to 0.75 (33). This subject indicates that Razzaghi’s study is more clinically applicable. Generally, at the national level, Razzaghi’s database yielded more generalizability than the SOReg and national registry of MBS(33). MLR and various ANN configurations with AUC between 0.5 and 0.6 are nearly inefficient when used for this database type.
More specifically, comparing the ML algorithms based on the complication types reported in two studies29,32 revealed that the DBN obtained an AUC of 0.94, 0.917, 0.891, and 0.834 for diabetes, dyslipidemia, hypertension, and sleep apnea, respectively. This algorithm obtained more predictive performance efficiency than other ML algorithms. The DBN and ANN, with an AUC of 0.75, gave us more predictive performance insights for depression and leakage, respectively. Also, ANN, XG-Boost, and LR, with an AUC ranging from 0.67 to 0.75 for thrombosis, acquired an almost satisfactory predictive ability compared to other ML algorithms. In one study, 30 ANN with an AUC of 0.82 achieved satisfactory predictive ability for complications of bariatric surgery at the one-single-center database.
Feature importance
Some factors were recognized as the top-ranked based on relative importance in the national registry. These factors included age, BMI, height, revision surgery, waist circumference, dyspepsia, ethnicity, operation year, and laboratory information. The importance of factors in predicting complications of bariatric surgery was not reported in the one-single-center database. Generally, demographic and laboratory features obtained more predictive competency at the national registry. Two out of three studies that reported the feature importance achieved an AUC of less than 0.6. So, we couldn’t consider them as generalizable predictors for other clinical environments. In one study, factors including age, BMI, weight, hematocrit, height, albumin, training level of first assistant, and ethnicity were reported as essential predictors for venus thrombosis and leakage, predicted with an AUC of nearly 0.7.
Discussion
This study was conducted to investigate and narrate the ML algorithms’ ability to predict the complications of bariatric surgery based on the data types used and complications to provide better insight for prediction purposes. The study’s results showed that the RF, bagging, Ada-Boost, DBN, and ANN algorithms gained satisfactory performance on this topic. RF combines several DT algorithms for mining purposes. As an ensemble, this algorithm usually achieves optimal predictive ability in various test scenarios. 35 Hsu et al. concluded that the RF algorithm achieved the best predictive power for gastrointestinal bleeding after bariatric surgery. 36 In another effort by Butler et al. to predict the readmission rate after bariatric surgery, the RF with an AUC of 0.785 (95% CI = [0.784–0.785]) gained higher performance than other ML algorithms. 14 Weerakoon et al. leveraged different ML algorithms to predict weight loss after bariatric surgery. They discovered that the RF model, with an accuracy of 95%–97%, gained the best performance for this aim. 37 Cao attempted to predict the long-term health-related quality of life among patients who underwent bariatric surgery using CNN. They compared this prediction performance with LR and concluded that the CNN achieved 8%–80% less mean squared error than LR in gaining predictive insight. 38 Sheikhtaheri et al. developed an ANN-based clinical decision support system (CDSS) to predict the short-term complications of gastric bypass surgery. The CDSS could predict the 10-days, 1-month, and 3-month complications with 98.4%, 96%, and 89.3% accuracy, respectively. Cao et al. used a CNN-based prediction strategy for recovery from type 2 diabetes after bariatric surgery. The CNN model could predict these patients’ pharmacological and complete remissions with an AUC of 0.85 and 0.83, 9%–11% higher than traditional prediction solutions. 39
Most previous studies on the complications of bariatric surgery have been conducted in the USA and Sweden, indicating the importance of adjunctive strategies to combat obesity in developed countries. However, few studies have been conducted on this topic in developing countries. Due to the nutritional and epidemiological transitions in recent years, the obesity phenomenon has found an upward trend, requesting supportive strategies such as bariatric surgery. Previous studies were conducted retrospectively on this topic. Future studies should focus on prospective cohort studies on different populations to improve data quality, such as completeness, and increase the accuracy of the mining process.
Also, no systematic review has been conducted on leveraging the ML approaches to predict the complications of bariatric surgery. Stam et al. 23 investigated the role of ML in the early detection of complications or mortality that could arise from any gastrointestinal surgery. Based on their review, the ML technique with an AUC ranging from 0.50 to 0.96 obtained a different performance efficiency in predicting the complications. Of course, the AUC of 0.96 is noteworthy for predictive purposes. In Henn’s 24 review, the ML algorithms with a mean AUC of 0.84 gave us insight into the favorable predictive performance for complications of abdominal surgery. In one meta-analysis conducted by Wang et al. , 26 they concluded that the ML with (ΔAUC, 0.07; 95% CI, 0.04-0.09; p < .001) is efficient for predicting the complications of gastrointestinal surgery. In the current systematic review, the ML algorithms obtained an AUC of 0.53 to 0.942, which gave us insight into performance efficiency based on the data types and complications of bariatric surgery. Based on the topic, the difference between the current study and other reviews is clear. This study specifically deals with the role of ML in predicting the complications of bariatric surgery, while others have focused on gastrointestinal or abdominal surgery. Due to substantial heterogeneity between studies on this topic, we couldn’t leverage the meta-analysis. In this condition, we used the narrative synthesis of the quantitative data to narrate and analyze the ML algorithms’ performance efficiency in different situations.
The current review’s results showed that the ensemble algorithms, including RF, Ada-Boost, and bagging with an AUC of 0.91 at the national registry, obtained more performance than other algorithms for predicting complications. Also, this predictive performance was superior to Cau’s studies28,31,34 regarding external validity and generalizability. In some of Cau’s studies, although the SMOTE technique was used for data balancing, we didn’t observe any higher ML algorithms’ performance regarding generalizability. This subject indicates the inefficiency of the SOReg registry in predicting complications of bariatric surgery based on the minority class in one aspect and the insignificant effect of SMOTE to eliminate the problems concerning the data imbalance in another. So, in this scenario, using the undersampling techniques might give us more predictive performance efficiency.
The undersampling techniques are less exposed to overfit than oversampling ones, especially when dealing with the minority classes with small data in large datasets, so they were probably considered a better strategy. For small datasets, oversampling may be preferable, but there is still a risk of overfitting. The oversampling techniques increase the number of minority class cases using synthetic cases obtained by ML algorithms such as KNN, so the generalizability of algorithms in this condition may be affected, as mentioned. 40 Also, in Nudel’s study, 32 without referring to the oversampling or undersampling methods, the performance of ANN, XG-Boost, and LR ranged from 0.64 to 0.75, indicating almost satisfactory for predicting complications. Based on the previous studies, we recommend leveraging the ensemble algorithms at the national registry for prediction purposes. Leveraging the strategies to solve the problems regarding the data imbalance has significantly depended on the database and algorithms’ generalizability at the national level, as observed. In some scenarios, the oversampling technique would give us more accuracy and generalizability, especially if the samples belonging to the minority class are representative at this level. Otherwise, the undersampling technique is a better strategy when we deal with a minority class that is not representative, and reducing the majority class would give us better generalizability and predictive performance. At a one-single-center database, the ANN with an AUC of 0.82 obtained satisfactory performance in predicting the complications of bariatric surgery. At this level, we suggest algorithms with more straightforward configurations to perform the prediction purposes more efficiently.
By looking at the previous studies’ results, we comprehended that the DBN with an AUC of 0.94, 0.917, 0.891, and 0.834 achieved more favorable performance compared to other ML algorithms in Cau’s study for predicting diabetes, dyslipidemia, hypertension, and sleep apnea, respectively. 29 Although the ML algorithms weren’t suitable for future work due to the lack of generalizability of the database used for general complications (SOReg registry), this registry gave us efficient predictive insight into these complications. ANN, XG-Boost, and LR, with an AUC ranging from 0.67 to 0.75, obtained more predictive performance than others. So, ensemble and non-ensemble algorithms can give us favorable predictive performance for these complications.
The univariate feature selection employed in the two studies on this topic28,33 is not a robust technique. Also, multivariable methods, such as logistic regression, give us more insight into obtaining essential factors for forecasting purposes. This subject was considered in previous studies.29,31,32,34 Also, identifying the factors influencing the complications of bariatric surgery can play a significant role in enhancing the clinical applicability of research, which is considered in Nudel’s study. 32 We suggest various post-training feature ranking techniques to enhance prognosis based on the outcome of the interest, for example, using the Relative Importance (RI) of the best algorithm obtained (as used in Nudel’s study 32 ), SHAP (Shapley Additive exPlanations), LIME, or permutation feature importance. However, the other studies on this topic did not consider these techniques. They must be applied in future studies to increase the algorithms’ explainability concerning complications of bariatric surgery.
The clinical applicability of the current study’s results can be investigated from an informatics point of view. We can leverage the best-performing ML model as an efficient clinical knowledge base to design intelligent CDSSs in healthcare environments to predict complications of bariatric surgery more effectively. Based on the previous studies, the features, including age, BMI, weight, hematocrit, height, albumin, training level of first assistant, and ethnicity, are essential, especially at the national level. By designing the CDSSs based on these features, doctors can assess the patients’ status when performing bariatric surgery. By getting assistance from these systems, they can assess individuals based on these features and benefit from the suggestions provided by the system for high-risk patients to make better individual decisions and achieve clinical solutions, such as preventive, diagnostic, or therapy measures, to reduce complications of this surgery in healthcare environments.
Limitations
Despite the advantages stated, some limitations are identified in this review. Some studies that focused on using ML algorithms to predict complications of bariatric surgery leveraged the national registry, and others were based on a one-single-centered database. The substantial differences between studies on this topic in sample size hampered us from statistically combining the results, and we couldn’t use meta-analysis in this review. So, the narrative synthesis of the quantitative data was carried out. In some studies, the complications were predicted by ML algorithms separately, while in others, they were reported generally. This subject incurred the increased heterogeneity between studies and hindered us from synthesizing each complication separately.
Few studies have been conducted on leveraging ML algorithms to predict complications of bariatric surgery, so seven papers were included in the current review. Although we could use other conditions, such as health-related quality of life after surgery, to increase the included studies, focusing on various complications of bariatric surgery has potential advantages that can prevent the negative consequences of this surgery. In other words, we have no limitation on the minimum number of papers that should be included in the review. Another limitation was the lack of discussion on the importance of features for complications in this review due to the lack of reporting the predictors and their significance in some studies and citing the top-ranked features in the studies that obtained low algorithms’ performance generalizability. In this condition, the suggestion on best predictors obtained from algorithms with this characteristic was not rational.
Conclusion
This review gave us an insight into the performance efficiency of different ML algorithms to predict complications of bariatric surgery based on the results of previous studies on this topic. We used narrative synthesis of the quantitative data to analyze and compare the predictive performance efficiency of ML algorithms based on different databases and surgical complication types. The current review showed that ensemble algorithms have performed satisfactorily in large datasets, especially in the national registry. The ANN outperformed other algorithms when dealing with one single-center database. The DBN outperformed for predicting complications such as diabetes, dyslipidemia, hypertension, sleep apnea, and depression. Also, the ANN, LR, and XG-Boost performed better in predicting thrombosis and leakage. Based on the study’s results, we concluded that the ML algorithms demonstrate efficient performance and can be leveraged as a prediction model to establish an effective knowledge base for intelligent systems, aiming to minimize complications. This aim can be achieved by delivering more evidence-based and personalized clinical recommendations introduced by systems to doctors to make more effective clinical decisions in healthcare settings.
Supplemental Material
Supplemental Material - Comparison of machine learning models to predict complications of bariatric surgery: A systematic review
Supplemental Material for Comparison of machine learning models to predict complications of bariatric surgery: A systematic review by Raoof Nopour in Health Informatics Journal
Supplemental Material
Supplemental Material - Comparison of machine learning models to predict complications of bariatric surgery: A systematic review
Supplemental Material for Comparison of machine learning models to predict complications of bariatric surgery: A systematic review by Raoof Nopour in Health Informatics Journal
Supplemental Material
Supplemental Material - Comparison of machine learning models to predict complications of bariatric surgery: A systematic review
Supplemental Material for Comparison of machine learning models to predict complications of bariatric surgery: A systematic review by Raoof Nopour in Health Informatics Journal
Supplemental Material
Supplemental Material - Comparison of machine learning models to predict complications of bariatric surgery: A systematic review
Supplemental Material for Comparison of machine learning models to predict complications of bariatric surgery: A systematic review by Raoof Nopour in Health Informatics Journal
Footnotes
Acknowledgements
We would like to thank all gastrointestinal surgeons affiliated with Mazandaran University of Medical Sciences (MAZUMS) who assisted us in conducting this study.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
