Abstract
Background:
Diabetic foot ulcers (DFUs) are serious complications of diabetes which can lead to lower extremity amputations (LEAs). Risk prediction models can identify high-risk patients who can benefit from early intervention. Machine learning (ML) methods have shown promising utility in medical applications. Explainable modeling can help its integration and acceptance. This study aims to develop a risk prediction model using ML algorithms with explainability for LEA in DFU patients.
Methods:
This study is a retrospective review of 2559 inpatient DFU episodes in a tertiary institution from 2012 to 2017. Fifty-one features including patient demographics, comorbidities, medication, wound characteristics, and laboratory results were reviewed. Outcome measures were the risk of major LEA, minor LEA and any LEA. Machine learning models were developed for each outcome, with model performance evaluated using receiver operating characteristic (ROC) curves, balanced-accuracy and F1-score. SHapley Additive exPlanations (SHAP) was applied to interpret the model for explainability.
Results:
Model performance for prediction of major, minor, and any LEA event achieved ROC of 0.820, 0.637, and 0.756, respectively, with XGBoost, XGBoost, and Gradient Boosted Trees algorithms demonstrating best results for each model, respectively. Using SHAP, key features that contributed to the predictions were identified for explainability. Total white cell (TWC) count, comorbidity score and red blood cell count contributed highest weightage to major LEA event. Total white cell, eosinophils, and necrotic eschar in the wound contributed most to any LEA event.
Conclusions:
Machine learning algorithms performed well in predicting the risk of LEA in a patient with DFU. Explainability can help provide clinical insights and identify at-risk patients for early intervention.
Keywords
Introduction
Diabetic foot ulcers (DFUs) impose a significant burden on health care systems globally, both in terms of clinical and economic implications, while diminishing the quality of life for individuals impacted by it. A study has shown that the average cost per patient-year of DFU patients ranges from US $3368 to US $30 131, depending on the severity of amputation. 1 In addition, DFU can lead to lower extremity amputation (LEA). It is estimated that approximately 80% of diabetes-related LEA are preceded by DFU. 2 In a retrospective observational study conducted in Singapore, the proportion of major LEA attributed to diabetes increased from 63.6% in 2008 to 81.7% in 2017. 3
The cause of DFU is complex and multifactorial. 4 Diabetes is associated with both microvascular and macrovascular complications. Peripheral neuropathy leads to repetitive stress over pressure points in the foot. Peripheral vascular disease of the lower limbs contributes to poor wound healing. Diabetic retinopathy can cause poor vision and lead to unexpected trauma to the foot. Poorly controlled diabetes is often associated with polymicrobial infections, and an infected ulcer may spread to the underlying bone, leading to osteomyelitis. 5 If treatment and wound care fails to heal the wound, it can progress and LEA may be necessary. 6
The prevention of diabetic foot amputation starts from prevention and treatment of DFU. 7 Some of such methods include sharp debridement, offloading, and revascularization.4,8 If preventive efforts are made at an earlier stage, appropriate care plans can be formulated at early stage of wound occurrence to prevent the worsening of DFU and consequent diabetic foot amputation. 9 Research has also shown that early revascularization and offloading can improve limb salvage rates.8,10-12
Accurate prediction of a diabetic foot amputation event will allow clinicians to proactively address the risks of the patient and start to implement treatment plans and preventive measures at the early stages of DFU, potentially leading to significant cost savings for both individuals and health care systems.
Previous research in this context includes the utilization of conventional statistical methods such as logistic regression, survival analysis or Bayesian statistics to identify precursors and risk factors of diabetic foot amputation.13-15 However, such methods do not capture the complex nonlinear relationships between the features and might compromise prediction accuracy. 16
Machine learning (ML) algorithms have demonstrated the ability to capture complex and difficult patterns. Such algorithms can process data in higher-dimensional space and potentially improve the prediction accuracy compared to traditional statistical models.17,18 In recent years, ML has found many successes in applications to problems in health care,19-21 including the prediction and diagnosis of DFU and complications such as LEA. 22
There were also other studies on the application of ML on the prediction of diabetic foot amputations. However, the sample sizes of some studies are relativity small,23-25 and might not be fully representative of the population. Stefanopoulos et al 26 used a large sample size of more than 300 000 patients for modeling, although the prediction horizon is for major amputation within the hospitalization episode as compared to this study with a 180-day horizon from time of admission.
In addition, some of such small samples studies are also varied in other areas such as focusing on only one specific type of amputation outcome (such as minor or major amputation),23,24,26 using only one type of ML algorithm,25,26 to develop the model and more. There were also studies that use a variety of ML algorithms, but the prediction generated might not be explainable due to the nature of its complexity,23,24 which can limit its interpretation by patients or health care providers in the health care setting. 27 Table 1 summarizes the studies which employed various ML algorithms for the prediction of diabetic foot amputations.
Summarized Studies of Prediction of Diabetic Foot Amputations Using ML Algorithms.
Abbreviations: ML, machine learning; BPNN, back propagation neural network; AUC, area under curve; NPV, negative predictive value; PPV, positive predictive value; CTREE, conditional inference trees; CI, confidence interval; LEA, lower extremity amputation.
Developed models are explainable due to the application of Explainable AI (XAI) techniques or the nature of algorithm itself (e.g., Visualization of Decision Tree).
This study aims to use ML algorithms to develop predictive models capable of predicting the risk of LEA in DFU with good performance. This study used a large repertoire of ML algorithms, evaluating and comparing their performance. The larger sample size and greater variety of features collected will provide assurance in a greater degree of generalizability that the model can be applied and implemented on. For comprehensiveness, this study also developed 3 ML models to examine the key risk factors of each amputation outcomes (major, minor, and any LEA), respectively. Finally, a model-agnostic explainability method will also be employed to interpret the output of the ML model, which may potentially generate new insights or factors that contribute to the risk of diabetic foot amputation. The interpretability of the model will improve the understanding of the predictions and increase the uptake of the implementation of ML models to aid clinical decisions.
Methods
Data Collection and Study Design
A comprehensive data set was retrospectively collected from electronic medical records from 2012 to 2017. The data set included 5043 in-hospital episodes of 2522 unique patients admitted for DFU in Tan Tock Seng Hospital, Singapore. Episodes with patients who are below 18 years old and have missing wound characteristics are excluded from the study. The data set collected features such as demographics, wound characteristics, laboratory results, medication history, and comorbidities, which are also identified using the ICD-10-AM diagnosis codes. 28 The list of ICD codes used to identify patients with DFU, and their respective comorbidities can be found in Supplemental Appendix A. Wound characteristics data were extracted from the institution wound-specific electronic medical records for inpatient wounds as discrete data elements. The list of wound characteristics can be found in Supplemental Appendix B.
Episodes with patients who experienced an amputation event within 180 days after admission date is classified as the case group. Episodes with patients who did not experience an amputation event within 180 days is classified as the control group. For the control group, we excluded episodes with patients that died within 180 days from admission date.
There were three different amputation outcomes collected: (1) major amputation, (2) minor amputation, and (3) either major or minor amputation (any LEA). Major amputation is defined as any amputation above the ankle while minor amputation is defined as any amputation below the ankle.
A total of three ML models are developed to predict the risk of amputation in each outcome with the objective of predict the risk of an amputation event within 180 days from the patient’s admission date. The participation flow diagram for each outcome is shown below in Figures 1 to 3.

Participation flow diagram for (1) major amputation.

Participation flow diagram for (2) minor amputation.

Participation flow diagram for (3) any lower extremity amputation.
Statistical Analysis
Descriptive analysis was performed on all three different scenarios of amputation outcomes. Continuous features were first tested for normality using Shapiro-Wilk’s test. 29 Features that are normally distributed are presented as mean ± standard deviation, while those that are nonnormally distributed are presented as median with interquartile range. For testing of statistical significance between the case and control group, t-test was employed for normally distributed features 30 and Mann–Whitney U test was employed for nonnormally distributed features. 31
Categorical variables are expressed as counts (n) with percentages (%), Pearson Chi-square test was employed to evaluate the significance between the case and control group. 32 A p value of <.05 was considered statistically significant. Statistical analysis of the baseline characteristics between patient cohorts can be found in the Supplemental Appendix C. All statistical analysis was performed using the statsmodels (Version 0.14.0) and SciPy (Version 1.11.1) packages from Python.
Data Preprocessing and Engineering
Features with more than 20% missing data were excluded and not used for modeling. For the remaining features with less than 20% missing data, it was assumed that they are missing at random and imputed using multiple imputation chained equations technique. 33
The data set is split to a “train set” and “test set” with a ratio of 70:30. The trainset is used for model development while the test set is held out and used to evaluate the model performance. For the continuous features, standardization is done using the data from train set and the computed mean and standard deviation is then applied to transform the data in test set. Categorical features are treated using one-hot encoding. After applying the above methods to the data set, 51 features were used for modeling. A list of features used for modeling can be found in Supplemental Appendix B.
Model Development
A total of 8 ML algorithms were applied to train the model, each algorithm is briefly described in Table 2.
List of ML Algorithms Applied.
Abbreviations: ML, Machine Learning; LASSO, Least Absolute Shrinkage and Selection Operator; SVM, Support Vector Machine.
The ML model is trained using the “train set,” and its hyperparameters are tuned using Bayesian Optimization,44,45 to find the optimal hyperparameters that gives the best model performance. Each set of hyperparameters was cross-validated and evaluated using Repeated Stratified five-fold with six repeats. Bayesian optimization uses the results from past iterations to iteratively explore the parameter space, update the model based on observed evaluations, and make informed decisions about the next set of parameters to evaluate. The process stops when it reaches a maximum number of iterations or the early stopping criteria is reached.
Model Evaluation
The performance of the model is evaluated using the following evaluation metrics: Area under the receiver operating characteristic curve (AUROC), F1-score and balanced-accuracy. Area under the receiver operating characteristic curve represents the degree of separability between the two classes, 46 accuracy measures how often the model correctly predicts its classes, and F1 score is a balance of both precision and sensitivity. 47
As compared to accuracy, which may be biased toward the majority class, balanced-accuracy calculates the average accuracy of each class. This makes balanced-accuracy a robust metric for imbalanced class problem, 48 which is appropriate to evaluate the major amputation and minor amputation model (26.7% and 27.4% incidence, respectively). The equation below shows the formula for balanced-accuracy.
The model with the best set of evaluation metrics is then selected.
Model Explanation
SHapley Additive exPlanation (SHAP) algorithm, being one of the popular model-agnostic methods for model explanations, is applied to interpret the model results. It uses the concept of “cooperative game theory” to understand the contribution of each feature to the final prediction. SHapley Additive exPlanation estimates the average marginal contribution of each feature, when is used as a form of feature importance. 49
Packages Used
Python 3 programming language was used for both statistical analysis and modeling,41,43,45,49-55 a list of packages used, and its functions is found in Supplemental Appendix D.
Results
Model Performance
The best trained model using data from the train set is reported with mean and 95% confidence interval. The model with the best performance in each ML algorithm is then evaluated using the test set.
The best performing model for predicting major amputation is XGBoost, with an AUROC and balanced-accuracy of 0.820 and 0.749, respectively. This showed that the model has demonstrated good performance in discriminating between the two classes, and its performance is also consistent in the prediction of the minority class. As for the model for any LEA, Gradient Boosted Trees performed the best with an AUROC of 0.756 and balanced-accuracy of 0.684.
However, the minor amputation model showed poorer performance as compared with the other two models, with XGBoost as the best performing algorithm with an AUROC and balanced-accuracy at 0.637 and 0.601, respectively. The results of the model performance of the train set are shown in Figures 4 to 6. Table 3 to 5 shows the performance of the model applied on the test set. Detailed values of the model performance can be found in Supplemental Appendix E.

Model performance on the train set of (1) major amputation model.

Model performance on the train set of (2) minor amputation model.

Model performance on the train set of (3) lower extremity amputation model.
Model Results for the (1) Major Amputation Model.
Abbreviations: ML, machine learning; AUROC = Area under the receiver operating characteristic curve.
Model Results for the (2) Minor Amputation Model.
Abbreviations: ML, machine learning; AUROC = area under the receiver operating characteristic curve.
Model Results for the (3) Lower Extremity Amputation Model.
Abbreviations: ML, machine learning; AUROC = area under the receiver-operating-characteristic curve.
Model Explainability by SHAP
Figures 7 to 9 below demonstrates the feature importance chart, calculated using SHAP values from the SHAP algorithm, displaying the top-12 features used by each model. A higher SHAP value indicates that the feature has a larger contribution to the model prediction. Figures 10 to 12 below depicts the beeswarm plot of the models. The beeswarm plot helps to understand the impact of individual features on model predictions.

Feature importance (using SHAP values) of the (1) major amputation model.

Feature importance (using SHAP values) of the (2) minor amputation model.

Feature importance (using SHAP values) of the (3) any lower extremity amputation model.

Beeswarm plot (using SHAP values) of the (1) major amputation model.

Beeswarm plot (using SHAP values) of the (2) minor amputation model.

Beeswarm Plot (using SHAP values) of the (3) any lower extremity amputation model.
Discussion
Model Performance
The major amputation prediction model demonstrated good performance and predictive ability, achieving an AUROC and balanced-accuracy of 0.820 and 0.749, respectively, using the XGBoost algorithm. This is consistent with the study by Stefanopoulos et al, 26 who demonstrated similar performance in predicting the risk of major amputation (AUROC: 0.84% and 77.8% accuracy). The difference in both studies lies in the timepoint of prediction; this study aims to predict the major amputation event within 180 days from admission whereas Stefanopoulos et al aims to predict a major amputation event during the hospitalization stay. The results from both studies complement each other in highlighting risk factors for predicting major amputation from clinical data.
The model performance for predicting minor amputation did not perform as well as the major amputation model, with an AUROC and balanced-accuracy at 0.637 and 0.601, respectively. Wang et al 24 specifically focused on using ML for prediction of minor amputations in patients with poor wound statuses (University of Texas Grade 3 and above), achieving a model performance with AUC 0.881. In comparison, our study did not have a strict inclusion criterion and included all patients with DFU regardless of its severity. Wang et al 24 used Synthetic Minority Oversampling TEchnique (SMOTE) to oversample for disparity in the initial data set, which can improve data imbalance but also lead to model overfitting especially when case numbers are small. In comparison, we did not use oversampling techniques but used multiple imputation chained equations (MICE) technique for missing data. Statistical analysis of our study cohort (found in Supplemental Appendix D) showed that the minor amputation and no minor amputation groups are not significantly different from each other, which can explain the difficulty for the model to discriminate between the minor amputation and no minor amputation groups; therefore, leading to poor model performance. As the poorer model performance is from lack of discrimination between the features of the two groups rather than imbalanced data set, oversampling techniques might not help overcome this limitation.
The clinical care model for DFUs in the institution uses an inpatient multidisciplinary care pathway across multiple disciplines (consisting vascular surgeons, orthopedic surgeons, endocrinologists, rehabilitation physicians, nurses, podiatrists, orthotists, physiotherapists, social workers, and case managers). While aimed to reduce variations in care, differences in practice patterns across different specialty teams might also lead to poor model discrimination. More studies, with large sample sizes and features included, can be conducted to improve model performance further.
Through the results of the model performance, ensemble learning ML models have proven to perform well in this context, as the best-performing models for all three scenarios are all boosting algorithms (XGBoost for major and minor amputation, Gradient Boosted Trees for LE amputation). Several other studies in this context have also found its success using ensemble learning models for prediction.24,56 Machine learning methods have the advantage of evaluating all available information and features, as compared to traditional statistical approaches where variables are selected based on presumed association. 57
The supplement of balanced-accuracy as a model evaluation metric showed that the model has consistent performance in predicting both the case and control group with good accuracy, especially for the prediction of a major amputation, thus making it a reliable model to be implemented for prediction.
Features Picked up by SHAP for Explainability
Features used by the ML model and explained by SHAP have been identified as key factors that are associated with the risk of diabetic foot amputation. For the major amputation prediction model, white blood cell (WBC) count being the most important feature selected by the model is not unexpected as a higher WBC count is typically associated with infection. 13 Likewise for Charlson Comorbidity Index (CCI) score, a patient with more comorbidities is likely to have more diabetic-associated complications, disease severity, and poorer limb salvage outcome, thus leading to a higher risk of major amputation. 58 Red blood cell (RBC) count is not a common predictor for diabetic foot amputation, although anemia (low RBC count) has been reported to be significantly associated with poor wound healing and amputation. 59 Antiplatelet use such as aspirin and adenosine diphosphate (ADP) receptor inhibitors (e.g. Plavix) has also been identified as an important feature, likely reflecting that patients have underlying coronary artery disease, cerebrovascular disease, or peripheral vascular disease.60-62 Wound characteristics such as necrotic eschar, osteomyelitis, depth, width, and length have also been identified as important features in the different models, with the presence of necrotic eschar being identified as the third most important feature for the model predicting any LEA. This supports the finding in some studies that wound features are an important aspect that predict long-term wound healing. 63 These features mentioned above have also been highlighted as important risk factors of diabetic foot amputation by other clinical studies, which show coherence between the model output and clinical interpretation.
Interestingly, red cell distribution width (RDW) is highlighted as a top feature in the minor amputation prediction model and as the sixth most important feature for major amputation prediction model. It is typically used as a clue in identifying nutritional deficiencies (e.g., iron, vitamin B12, and folate) in anemia; however, some studies have also identified RDW to be independently associated with worse outcomes and complications in diabetes. 64 Higher RDW values have been shown to be independently associated with vascular complications in diabetes, 65 and this was observed within the Singapore population as well. 66 The exact pathophysiology of this is unknown but thought to be related to the increased inflammatory burden seen in diabetes. This highlights the importance of using large data sets in ML models which can help identify important features that may not be thought to be important and excluded in traditional statistical modeling.
The key features identified by SHAP are slightly different between the major amputation and minor amputation prediction models; therefore, suggesting that future treatment protocols to reduce the risk of a major or minor amputation should be tailored uniquely as well.
Limitations
The development of the prediction model is limited to the features that are used to train it. Although the features used to train the model is comprehensive, there may still be other features that are useful predictors to diabetic foot amputation that are not found in this study. Examples include the patient’s nutrition levels 67 and radiological imaging of the wound. Wound staging systems such as the Wagner classification and Wound, Ischemia, foot Infection (WIfI) scores were also not in routine utilization during the time period studied and thus also not included in the study. As the treatment of DFU is a multidisciplinary effort, including data from other clinical disciplines or areas related to the management and treatment of DFU (such as podiatric and orthotic interventions) might produce a more holistic model.
Owing to the retrospective nature of the study, missing data are inevitable and is a limitation in our study. Inpatient episodes with missing wound characteristic data have been excluded. Although the differential missingness between the case and control group is similar, this can lead to bias. The absence of such data and the workaround of using imputed values may potentially cause bias in the modeling. Moreover, the features that are used for modeling in this study is collected at a single time point, which is during patient’s admission. The features will be more informative if they can be collected at multiple time points and at different care settings such as in outpatient settings and in step-down care. The inclusion of such features may also be informative to the model for better prediction performance.
Future Work
Future expansion of the study can include the collection of more data at multiple time points to track the trajectory of the risk of amputation. This can include prospective studies which can also be used to validate the model and its accuracy, as practice changes and clinical outcomes may change over time. The study could also incorporate of other forms of data such as wound images and radiological imaging characteristics to improve the performance of the prediction model. Including wound staging systems such as the WIfI score or Wagner classification and evaluating the additive value of ML models on top of existing staging systems would also be helpful. The complex nature of training a predictive model to ingest different modes of data relies on the need for deep learning methods such as multimodal deep learning, which can also be considered in the future.68,69 Integration of the model into the institution electronic medical record system as a risk predictor for amputation can also be considered and help provide personalized care.
Conclusion
In summary, this study applied a wide variety of ML algorithms on a comprehensive data set with a large sample size to predict the risk of LEA in patient episodes with DFU within 180 days from admission date. This study developed three ML models to understand the key factors of each amputation outcomes (major, minor, and any LEA).
Machine learning techniques are shown be effective and have good predictability in predicting the risk of a DFU patient experiencing amputation, especially with the prediction of major amputation, with an AUROC of 0.820. The utilization of balanced-accuracy metric has showed that the model is able to produce consistent accuracy in the prediction of both classes, making it a reliable model for implementation. With the adoption of explainable modeling such as SHAP, it provides insights into the importance of each feature in contributing to the model’s output and makes the deployment of models “transparent” and more feasible. Future work can include additional features and prospective data to enhance the prediction model to be a more comprehensive and holistic one.
Supplemental Material
sj-docx-1-dst-10.1177_19322968241228606 – Supplemental material for Risk Prediction of Diabetic Foot Amputation Using Machine Learning and Explainable Artificial Intelligence
Supplemental material, sj-docx-1-dst-10.1177_19322968241228606 for Risk Prediction of Diabetic Foot Amputation Using Machine Learning and Explainable Artificial Intelligence by Chien Wei Oei, Yam Meng Chan, Xiaojin Zhang, Kee Hao Leo, Enming Yong, Rhan Chaen Chong, Qiantai Hong, Li Zhang, Ying Pan, Glenn Wei Leong Tan and Malcolm Han Wen Mak in Journal of Diabetes Science and Technology
Footnotes
Abbreviations
AUROC, area under the receiver operating characteristic curve; CCI, Charlson Comorbidity Index; DFU, diabetic foot ulcers; LEA, lower extremity amputations; MICE, multiple imputation chained equations; ML, machine learning; ROC, receiver operating characteristic; SHAP, SHapley Additive exPlanations; SMOTE, synthetic minority oversampling technique.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material is available in the online version of the article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
