Abstract
Background
The hypoxemia risk in adult (18–64) patients treated with esophagogastroduodenoscopy (EGD) under sedation often poses a dilemma for anesthesiologists. We aimed to establish an artificial neural network (ANN) model to solve this problem, and introduce the Shapley additive explanations (SHAP) algorithm to further improve the interpretability.
Methods
The relevant data of patients underwent routine anesthesia-assisted EGD were collected. Elastic network was used to filter the optimal features. Airway-ANN and Basic-ANN models were established based on all collected indicators and remaining variables excluding airway assessment indicators, respectively. The performance of Basic-ANN, Airway-ANN and STOP-BANG was evaluated by the area under the precision-recall curve (AUPRC) on temporal validation set. The SHAP was used for revealing the predictive behavior of our best model.
Results
999 patients were eventually included. The AUPRC value of Airway-ANN model was significantly higher than Basic-ANN model in the temporal validation set (0.532 vs 0.429, P < 0.05). And the performance of both two ANN models was significantly better than that of STOP-BANG score (both P < 0.05). The Airway-ANN model was deployed to the cloud (http://njfh-yxb.com.cn:2022/airway_ann).
Conclusion
Our online interpretable Airway-ANN model achieved satisfying ability in identifying the hypoxemia risk in adult (18–64) patients undergoing EGD.
Introduction
Hypoxemia is the most common complication in esophagogastroduodenoscopy (EGD) examination under sedation, with an incidence of 10% to 70%.1–3 It will also prolong the operation time and affect the experience of patients. Prolonged hypoxemia may also lead to more serious complications.1,4 In the outpatient environment with amounts of patients and rapid workflow, anesthesiologists usually devote more energy to these high-risk elderly patients for their special physiological conditions, such as poorer metabolism and greater sensitivity to sedatives.5,6 While less attention is paid to relatively young patients, so that these patients may not be timely rescued once they develop serious complications. Therefore, the preoperative risk evaluation of intraoperative hypoxemia for adult (18–64) patients could bring great convenience for anesthesiologists to allocate attention reasonably.
At present, few specific models are available to predict the hypoxemia risk in EGD patients.7,8 Anesthesiologists are often required to measure multiple indicators for pre-anesthesia evaluation, including the STOP-BANG score, body mass index (BMI), airway assessment and so on. 9 However, these evaluation indicators are very complex. After that, the comprehensive review and analysis of these indicators is also time-consuming and laborious for anesthesiologists before each EGD procedure, which also runs counter to the concept of rapid flow in the outpatient clinic. A new prediction tool that could integrate multi-dimensions indicators and fully identify the information from the data is required.
Artificial neural network (ANN) has been widely studied in the medical field recently.10,11 It proved its advantage over the traditional regression model in identifying nonlinear relations. 12 In addition, ANN could observe and learn the information from big-data, and discover the hidden relationship between input parameters and output parameters. Although there is a black box dilemma that cannot directly explain how an ANN model derives the prediction, some studies have managed to apply the Shapley additive explanations (SHAP) analysis to interpret the predicted results for improving the availability and credibility.13,14
In this study, we aimed to establish an ANN model to predict the risk of hypoxemia in adult (18–64) patients during EGD, and introduce the SHAP algorithm to improve the model's interpretability and clinical practicability. We also attempted to explore whether appending airway assessment indicators to the ANN model would improve the predictive ability of the model.
Method
Study population
This is a retrospective study based on patients underwent routine anesthesia-assisted EGD between June and September 2021. All data information was collected by professional anesthesiologists in the electronic medical record system of Nanjing First Hospital. The study was conducted under a protocol approved by the Nanjing First Hospital ethics committee and the requirement for informed consent was waived (grant number: KY20220509-01-KS-01). The inclusion criteria for this study were as follows: age of ≥ 18 and < 65, 15 and American Society of Anesthesiologists (ASA) class I–III; while the exclusion criteria as follows: baseline pulse oximeter oxygen saturation (SpO2) < 95%, hemoglobin < 100 g/L, pregnancy, upper respiratory tract infection, significant cardiopulmonary diseases such as acute myocardial infarction, heart and respiratory failure, and without availability of complete clinical information.
Treatment protocol and outcome assessment
Prior to EGD, the pre-anesthesia evaluation was performed by anesthesiologists for each patient. Anesthesiologist empirically carried out preventive intervention and sedation based on a bispectral index for the patients who were evaluated as at high risk of hypoxemia. The preventive intervention included the application of nasopharyngeal airway and high flow oxygen. All patients received oxygen supplement through nasal cannula (3 L/min, 3 minutes). Anesthesiologists used propofol or combined with low-dose etomidate for anesthesia induction. The modified observer's assessment of alertness/sedation scale (MOAA/S) was used for assessing the depth of sedation, and the endoscopic intubation was performed once the MOAA/S was < 2. Throughout the operation, the patient was kept in a lateral position and propofol combined with remifentanil was given to maintain the depth of sedation (MOAA/S < 2). Also, the patient’s SpO2, electrocardiogram and blood pressure were monitored during the operation. The protective procedures were performed as follows: increasing the oxygen flow from 3 to 6 L/min (once patients present with 90% ≤ SpO2 < 95%), increasing the oxygen flow to 6 L/min and opening the airway by the jaw-thrust maneuver at same time (75% ≤ SpO2 < 90% for < 60 s), performing mask ventilation (75% ≤ SpO2 < 90% for ≥ 60 s or SpO2 < 75%), and implementing tracheal intubation in case the hypoxia still could not be corrected. After the operation, patients were transferred to the post-anesthesia care unit when their MOAA/S score was ≥ 4. Hypoxemia was defined as at least one pulse oximetry reading < 90% at any time during EGD (from the beginning of anesthesia induction to the point that the patients were transferred to the post-anesthesia care unit) and without probe misalignment, regardless of episode duration.2,16–18
Data collection
The basic clinical information of patients was collected as followed: age; gender; height; weight; BMI; drinking; smoking; ASA class; hemoglobin; baseline SpO2 and pre-existing medical history, such as hypertension, diabetes, chronic obstructive pulmonary disease (COPD) and so on. Patient’s data of airway assessment indicators were also collected as follows: neck circumference, thyromental distance (TMD), height to thyromental distance (RHTMD), thyromental height (TMH), sternomental distance (SMD), height to sternomental distance (RHSMD), inter-incisor distance, modified Mallampati class, and short neck. The STOP-BANG score was assessed and patients with a score ≥ 3 were considered to have a high risk of hypoxemia. All data were collected by at least two anesthesiologists.
Data preprocessing
The basic dataset was randomly divided into training and validation set (8:2). The training set was used for model development, while the validation set for internal validation and hyperparameter adjustment during the model training process. All continuous variables were normalized by Z-score, 19 and the multi-classification variables (categories > 2) was transformed into dummy variables by One-Hot encoding. 20
Feature selection
In order to enhance the robustness of the model, the elastic network (EN) algorithm
21
was used to screen the best features for predicting hypoxemia. EN combines the least absolute shrinkage and selection operator (LASSO) and ridge regression, which can be expressed as
Model development
Based on ANN algorithm, two types of prediction models were established: Airway-ANN, which included all collected indicators; and Basic-ANN, which included remaining basic variables excluding airway assessment indicators. The adjustment for structures and hyperparameters of ANN was based on validation set and grid search with 10-fold cross-validation. Furthermore, the cost-sensitive learning method was used to compensate for the potential impact of imbalanced data on model development to reduce the misclassification of high-risk patients with hypoxemia. The ANN algorithm was conducted using python package Keras 2.4.3 (https://github.com/keras-team/keras).
Model evaluation
Another data of patients who received EGD at Nanjing First Hospital in December 2021 were collected as independent additional dataset, which was only used for temporal validation to assess the generalization ability of the prediction model. An area under precision-recall curve (AUPRC) was regarded as the main evaluation metrics, which could better reflect the prediction performance of the model in the imbalanced data. Moreover, the receiver operating characteristic (ROC) curve was also calculated, based on which the optimal threshold of the model was determined by the Youden's index, and then the accuracy, sensitivity and specificity were calculated under this threshold. The brier score was used to evaluate the calibration of the model. In addition, based on the evaluation metrics above, the performance of STOP-BANG score was compared with the ANN model.
Model interpretation
The SHAP 13 was used to reveal the reasons behind the model predictions, to provide anesthesiologists with insights into the drivers of the predictive models, and to validate them based on specialized medical expertise. This method calculates the SHAP values by coalition game theory to assess the marginal contribution of each variable of an individual toward the model final prediction. In this study, we calculated the SHAP values of per variable for all individual cases in the training set and averaged them to obtain the global interpretation of the model. In addition, a certain number of cases were listed to reveal how the ANN model yields each individual prediction.
Statistical analysis
The continuous variables in accordance with normal distribution (by Shapiro–Wilk test) were expressed as mean ± SEM and tested for significance with Student t-test, while the non-normal distribution one was described as median with interquartile ranges and analyzed with Mann–Whitney U-test. Categorical variables were reported as counts with percentages, and analyzed by Chi-squared or Fisher’s exact test. A P-value of less than 0.05 was defined as statistically significant. Statistical analyses were performed with the software package SPSS 24.0 (IBM Corp, Armonk, NY).
Result
Study population
The basic dataset contained 1226 patients and only 999 eligible patients were eventually included in the study (799 in training set, 80%). The flow chart of patient selection is shown in Figure 1. Among them, hypoxemia during EGD occurred in 70 (7.0%), of whom 57 (81.4%) had a STOP-BANG scores of ≥ 3. Besides, a total of 216 patients were included in the additional dataset (hypoxemia of 8.8%). The detailed information about patients and dataset is provided in Supplementary Tables S1, S2 and S3, respectively.

Flow chart of patient selection.
Feature selection and model development
After selected by EN, 5 variables were retained and incorporated into the Basic-ANN model development, which include age, BMI, neck circumference, ASA scores, and baseline SpO2. For Airway-ANN model, 10 variables were considered to be closely related to the occurrence of intraoperative hypoxemia and incorporated into the model, which include age, BMI, neck circumference, ASA scores, RHTMD, TMH, RHSMD, inter-incisor distance, modified Mallampati class, and baseline SpO2. The detailed information on non-zero coefficient variables after EN is given in Supplementary Tables S4.
Model performance
The performance in the temporal validation set of the three prediction tools (Basic-ANN, Airway-ANN, and STOP-BANG score) in PRC and ROC space is shown in Figure 2. Among them, Airway-ANN achieves the highest AUPRC value of 0.532 and significantly outperforms the other two models (P < 0.05, shown in Table 1). In addition, the performance of both two ANN models on AUPRC and AUROC is significantly better than that of STOP-BANG score (P < 0.05, Table 1). Additional information about the performance of the model is shown in Supplementary Tables S5, and the hyperparameters of the model are shown in Supplementary Table S6.

The PRC (a) and ROC (b) of Basic-ANN model, Airway-ANN model and STOP-BANG score on temporal validation set. Abbreviations: AUPRC, area under precision-recall curve; AUROC, area under receiver operating characteristic curve; ANN, artificial neural network.
Comparison of the performance between the prediction model and STOP-BANG score for predicting intraoperative hypoxemia.
AUROC, the area under the receiver operating characteristic curve; AUPRC, the area under the precision-recall curve; ANN, artificial neural network; Bootstrap CI, the values outside the parentheses represent the difference between AUPRCs (bootstrap confidence interval), and if the confidence interval does not include 0, the difference between AUPRCs is considered to be significant (P-value < 0.05); P-value, the P-values of AUROC difference between prediction models and STOP-BANG score; *P-value < 0.05.
Model interpretation and application
We provide a global interpretation to the prediction behavior of Airway-ANN (Figure 3) by SHAP. Among the 10 variables included in the model, BMI, baseline SpO2 and neck circumference are the three variables with the greatest influence on the model prediction (Figure 4(a)). Patients with higher BMI, lower baseline SpO2, and larger neck circumference have a higher the risk of hypoxemia during the EGD (Figure 4(b)).

The structure diagram of Airway-ANN model. Abbreviations: ANN, artificial neural network; Airway-ANN, the ANN model that added airway assessment indicators on the basis of basic clinical variables; ASA, American Society of Anesthesiologists; RHTMD, height to thyromental distance; RHSMD, height to sternomental distance; TMH, thyromental height.

The plots (Airway-ANN model) of the features importance ranking (a) based on SHAP and the way that they influence the prediction result (b). (a) The sort result was based on the mean SHAP value (abscissa) of each variable in Airway-ANN model. (b) In the plot, each line represents a variable and each dot represents a case. The redder the color of the dot, the higher the value of the variable for the case, and vice versa. The abscissa represents the SHAP value, where a positive value helps the model to predict the case developing a hypoxemia, and vice versa. Abbreviations: ANN, artificial neural network; Airway-ANN, the ANN model that added airway assessment indicators on the basis of basic clinical variables; ASA, American Society of Anesthesiologists; RHTMD, height to thyromental distance; RHSMD, height to sternomental distance; TMH, thyromental height; SHAP, Shapley additive explanations.
Three patients were randomly selected who represented true positive, true negative and false negative cases to reveal how the model produced individual predictions. For the true positive case (Figure 5(a)), the model correctly predicted the risk of hypoxemia in this patient (probability of 0.283, threshold of 0.244), with BMI of 28.3, neck circumference of 42, and RHTMD of 24 making the largest contribution to this prediction. And for the true negative case (Figure 5(b)), the model also correctly predicted the probability to be 0.209, with the TMH of 5.8, baseline SpO2 of 98, and RHTMD of 22.05. However, for the false negative case (Figure 5(c)), according to the modified Mallampati class of 0, baseline SpO2 of 98 and neck circumference of 35, the model mistakenly predicted the probability (0.147) of hypoxemia during EGD, which indicates that these three variables are also the main reasons for the model to make wrong predictions. We also deployed the Airway-ANN model to the cloud, where other users could access, use and validate it through the following URL: http://njfh-yxb.com.cn:2022/airway_ann.

Individualized interpretation of Airway-ANN model prediction based on SHAP. Three random cases, (a) true positive, (b) true negative and (c) false negative. Red represents that the variable increases the chance for the model to make high-risk prediction results of hypoxemia, while blue represents that the variable reduces the chance that the model predicts the patient developing a hypoxemia, and the length of the color bar represents the amount of contribution for prediction result. The number represents the true value of the corresponding variable. Abbreviations: ANN, artificial neural network; Airway-ANN, the ANN model that added airway assessment indicators on the basis of basic clinical variables; ASA, American Society of Anesthesiologists; RHTMD, height to thyromental distance; RHSMD, height to sternomental distance; TMH, thyromental height; SHAP, Shapley additive explanations. The ASA score from 0 to 2 represents I, II and III, respectively. The modified Mallampati class from 0 to 2 represents I, II, III/IV, respectively.
Discussion
To the best of our knowledge, this study is the first interpretable ANN model that accurately predicts the risk of hypoxemia in adult (18–64) patients. The hypoxemia risk of these patients is relatively lower than that in elderly patients. But anesthesiologists exactly find difficult to deal with this group for their ambiguous risks. Our research may be able to solve this problem by developing ANN model to predict their hypoxemia risk. In the process of model validation, our Airway-ANN model performed better than the ANN-Basic model significantly, and both two ANN models overperformed than the STOP-BANG score in each index. To further improve the availability of our Airway-ANN model, we deploy it to the cloud. By visiting the website, anesthesiologists could directly use the model to predict the hypoxemia risk of patients, and researchers could further validate the predictive ability of our model.
The reason why our model performed better than the STOP-BANG score may be that our model was specifically established for predicting hypoxemia. Although most of the indicators contained in this score were previously found to be correlated with hypoxemia, it is still a scoring model for screening obstructive sleep apnea. Its weight ratio of each indicator may be not well applicable to adult (18–64) EGD patients. Actually, in clinical practice, anesthesiologists also combine various indicators for risk assessment, rather than only considering the STOP-BANG score. 9 Another reason for this result may be the application of ANN methods. Comparing than traditional statistical methods, ANN predominated in discovering potential relationships from multiple variables.11,12
The unexplained prediction results would limit the application of the model in clinical practice. Without understanding which factors affect the predicted results, anesthesiologists are unable to reduce hypoxemia risk by adjusting corresponding strategies. Accordingly, we introduced the SHAP algorithm to interpret our Airway-ANN model for anesthesiologists succinctly how each feature contributed to the prediction.13,14 Although the contribution of these features may not be necessarily causal relationship, the interpretation could still help anesthesiologists to carry out targeted individual intervention, but also bring better clinical applicability to our model.
In addition, we also sorted the overall importance of all factors included in Airway-ANN model through this algorithm to facilitate analysis and discussion. It could be found that BMI3,16 and age22,23 showed a higher correlation with hypoxemia, consistent with many previous studies. Another important factor is the baseline SpO2, which still has an important impact on predicting outcomes after excluding patients with a baseline SpO2 < 95. This might suggest that some measures to increase the baseline SpO2 before operation could effectively reduce patients’ risk of hypoxemia. It is worth noting that neck circumference, as one of the indicators of STOP-BANG score, also manifested strong importance in Airway-ANN model. A recent study also suggested that neck circumference could be used as a hypoxia screening method for patients undergoing endoscopy. 24 In addition, we found that patients with higher ASA grades were more likely to develop hypoxemia, which was similar to a recent large multi-center registry study. 25 While the relative importance of the ASA class in our model was relatively low, which may be related to the object of our study. For outpatients undergoing EGD, the anesthesiologist would decline the sedation requirements of ASA class 4-5 patients to ensure the safety of the operation.
After comparing the predictive power of Basic-ANN and Airway-ANN models, we found that introducing airway assessment indicators could significantly improve the predictive performance of the ANN model. This was also confirmed in the feature importance ranking analysis based on SHAP algorithm. Totally four airway assessment indicators were included in the Airway-ANN model, among which modified Mallampati class were the most important, while the importance of inter-incisor distance was very weak. Baillard et al. 26 found that anticipated difficult mask ventilation and anticipated difficult tracheal intubation were independent risk factors for hypoxemia in their prospective study. Besides, some previous studies also suggested that modified Mallampati class is associated with hypoxemia.17,27,28 This result also supports the recommendation that anesthesiologists should conduct an airway assessment before sedation, 9 and suggests that it is necessary for anesthesiologists to make difficult airway plans in advance.
There are still several limitations in the present study. First, this is a retrospective study. Besides, the features collected in our dataset are based on the anesthesiologist’s experience and existing research, and does not include all available factors. These may cause us to miss out on some potential factors. Second, our analysis does not take into account variables such as sedation dose and operation duration, since our aim is to predict intraoperative hypoxemia based on preoperative indicators. Third, our sample size is relatively small, and there are no data from different regions and race for validating. This would limit the generalizability of our model in other centers. Therefore, large-scale multi-center prospective cohort studies are still required in the future.
Conclusions
Our interpretable Airway-ANN model achieved a satisfactory ability to distinguish the hypoxemia risk in adult (18-64) patients undergoing EGD. In addition, introducing airway assessment indicators could bring a high improvement to the Basic-ANN model. By employing the SHAP algorithm and deploying the model to the cloud, our Airway-ANN model has the potential to be applied in the clinic for aiding anesthesiologists to formulate sedation plans and reduce the burden of anesthesiologists.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076231180522 - Supplemental material for An interpretable artificial neural network model for predicting hypoxemia via an online tool in adult (18–64) patients during esophagogastroduodenoscopy
Supplemental material, sj-docx-1-dhj-10.1177_20552076231180522 for An interpretable artificial neural network model for predicting hypoxemia via an online tool in adult (18–64) patients during esophagogastroduodenoscopy by Weigen Xiong, Daizun Zou, Zhaojing Fang, Xiuxiu Zhao, Chen Chen, Jianjun Zou and Yanna Si in DIGITAL HEALTH
Footnotes
Acknowledgments
The authors would like to thank the editors and reviewers for his assistance and guidance in this research.
Contributorship
WX: Conception and design, acquisition of data, and drafting the article. DZ: Acquisition of data, analysis and interpretation of data, and drafting the article. ZF: Analysis and interpretation of data, and revising it critically for important intellectual content. XZ: Acquisition of data and revising it critically for important intellectual content. CC: Acquisition of data and revising it critically for important intellectual content. JZ: Conception and design, and revising it critically for important intellectual content. YS: Conception and design, and revising it critically for important intellectual content. All authors read and approved the final manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
This study was approved by the Nanjing First Hospital ethics committee and the requirement for informed consent was waived (grant number: KY20220509-01-KS-01).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (82173899, and 81873954), Jiangsu Pharmaceutical Association (H202108 and A2021024), and Six Talent Peaks Project in Jiangsu Province, National Natural Science Foundation of China, Jiangsu Pharmaceutical Association, (grant number WSW-106, 81873954, 82173899, A2021024, H202108).
Guarantor
JZ.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
