Risk Prediction of Diabetic Foot Amputation Using Machine Learning and Explainable Artificial Intelligence

Abstract

Background:

Diabetic foot ulcers (DFUs) are serious complications of diabetes which can lead to lower extremity amputations (LEAs). Risk prediction models can identify high-risk patients who can benefit from early intervention. Machine learning (ML) methods have shown promising utility in medical applications. Explainable modeling can help its integration and acceptance. This study aims to develop a risk prediction model using ML algorithms with explainability for LEA in DFU patients.

Methods:

This study is a retrospective review of 2559 inpatient DFU episodes in a tertiary institution from 2012 to 2017. Fifty-one features including patient demographics, comorbidities, medication, wound characteristics, and laboratory results were reviewed. Outcome measures were the risk of major LEA, minor LEA and any LEA. Machine learning models were developed for each outcome, with model performance evaluated using receiver operating characteristic (ROC) curves, balanced-accuracy and F1-score. SHapley Additive exPlanations (SHAP) was applied to interpret the model for explainability.

Results:

Model performance for prediction of major, minor, and any LEA event achieved ROC of 0.820, 0.637, and 0.756, respectively, with XGBoost, XGBoost, and Gradient Boosted Trees algorithms demonstrating best results for each model, respectively. Using SHAP, key features that contributed to the predictions were identified for explainability. Total white cell (TWC) count, comorbidity score and red blood cell count contributed highest weightage to major LEA event. Total white cell, eosinophils, and necrotic eschar in the wound contributed most to any LEA event.

Conclusions:

Machine learning algorithms performed well in predicting the risk of LEA in a patient with DFU. Explainability can help provide clinical insights and identify at-risk patients for early intervention.

Keywords

diabetes diabetic foot ulcer lower extremity amputation machine learning model explainability SHapley Additive exPlanations wounds

Introduction

Diabetic foot ulcers (DFUs) impose a significant burden on health care systems globally, both in terms of clinical and economic implications, while diminishing the quality of life for individuals impacted by it. A study has shown that the average cost per patient-year of DFU patients ranges from US $3368 to US $30 131, depending on the severity of amputation.¹ In addition, DFU can lead to lower extremity amputation (LEA). It is estimated that approximately 80% of diabetes-related LEA are preceded by DFU.² In a retrospective observational study conducted in Singapore, the proportion of major LEA attributed to diabetes increased from 63.6% in 2008 to 81.7% in 2017.³

The cause of DFU is complex and multifactorial.⁴ Diabetes is associated with both microvascular and macrovascular complications. Peripheral neuropathy leads to repetitive stress over pressure points in the foot. Peripheral vascular disease of the lower limbs contributes to poor wound healing. Diabetic retinopathy can cause poor vision and lead to unexpected trauma to the foot. Poorly controlled diabetes is often associated with polymicrobial infections, and an infected ulcer may spread to the underlying bone, leading to osteomyelitis.⁵ If treatment and wound care fails to heal the wound, it can progress and LEA may be necessary.⁶

The prevention of diabetic foot amputation starts from prevention and treatment of DFU.⁷ Some of such methods include sharp debridement, offloading, and revascularization.^4,8 If preventive efforts are made at an earlier stage, appropriate care plans can be formulated at early stage of wound occurrence to prevent the worsening of DFU and consequent diabetic foot amputation.⁹ Research has also shown that early revascularization and offloading can improve limb salvage rates.^8,10-12

Accurate prediction of a diabetic foot amputation event will allow clinicians to proactively address the risks of the patient and start to implement treatment plans and preventive measures at the early stages of DFU, potentially leading to significant cost savings for both individuals and health care systems.

Previous research in this context includes the utilization of conventional statistical methods such as logistic regression, survival analysis or Bayesian statistics to identify precursors and risk factors of diabetic foot amputation.^13-15 However, such methods do not capture the complex nonlinear relationships between the features and might compromise prediction accuracy.¹⁶

Machine learning (ML) algorithms have demonstrated the ability to capture complex and difficult patterns. Such algorithms can process data in higher-dimensional space and potentially improve the prediction accuracy compared to traditional statistical models.^17,18 In recent years, ML has found many successes in applications to problems in health care,^19-21 including the prediction and diagnosis of DFU and complications such as LEA.²²

There were also other studies on the application of ML on the prediction of diabetic foot amputations. However, the sample sizes of some studies are relativity small,^23-25 and might not be fully representative of the population. Stefanopoulos et al²⁶ used a large sample size of more than 300 000 patients for modeling, although the prediction horizon is for major amputation within the hospitalization episode as compared to this study with a 180-day horizon from time of admission.

In addition, some of such small samples studies are also varied in other areas such as focusing on only one specific type of amputation outcome (such as minor or major amputation),^23,24,26 using only one type of ML algorithm,^25,26 to develop the model and more. There were also studies that use a variety of ML algorithms, but the prediction generated might not be explainable due to the nature of its complexity,^23,24 which can limit its interpretation by patients or health care providers in the health care setting.²⁷ Table 1 summarizes the studies which employed various ML algorithms for the prediction of diabetic foot amputations.

Table 1.

Summarized Studies of Prediction of Diabetic Foot Amputations Using ML Algorithms.

Authors	Sample size	Features considered	Type of amputation	ML algorithms employed	Performance
Lin et al²³	200 patients	Total of 69 features were examined from lab results and indexes 12 features were selected and used for modeling	Any LEA	Back propagation neural network (BPNN) BPNN based on genetic algorithm optimization	BPNN AUC: 0.924 Sensitivity: 100% Specificity: 81.82% BPNN based on genetic algorithm optimization AUC: 0.891 Sensitivity: 100% Specificity: 78.95%
Wang et al²⁴	362 cases (University of Texas grade 3 ulcers) Minor amputation: 75 No amputation: 287	Total 21 features from various categories: Medical history, demographics, wound characteristics, lab results	Minor Amputation	Decision tree, random forest, logistic regression, support vector machine, and XGBoost	XGBoost showed the best performance AUC: 0.881 Accuracy: 81.4% Sensitivity (Recall): 76.7% Precision 84.6%, F1-score: 80.5%
Xie et al^25,a	618 patients Major amputation: 47 Minor amputation: 71 No amputation: 500	Total 38 features from various categories: Medical history, demographics, wound characteristics, lab results	No amputation, minor or major amputation (multiclass model)	Light gradient boosting machine (LightGBM)	Weighted-average AUC: 0.90 Sensitivity (Recall): 87.1% Specificity: 74.4% NPV: 79.7% PPV: 86.3%
Stefanopoulos et al²⁶	326 853 patients Major amputation: 19 344 No amputation: 307 509	Total 5 features were selected for modeling Gangrene, osteomyelitis, peripheral vascular disease, systemic infection, and weight loss	Major amputation	Random forest (CTREE)	AUC: 0.84 (95% CI = 0.83-0.85) Sensitivity (recall): 76.2% Specificity: 79.4%
Current work^a	Major amputation 2559 cases Major amputation: 683 No major amputation: 1876 Minor amputation 2445 cases Minor amputation: 670 No major amputation: 1775 Any LEA 2532 cases LE amputation: 1203 No LE amputation: 1329	Total 51 Features from various categories: Medical history, demographics, wound characteristics, lab results and medication history	Major Amputation Minor Amputation LEA 3 models, 1 for each outcome	Decision tree, random forest, gradient boosted trees, logistic regression, support vector machine, XGBoost, CatBoost, and AdaBoost	Major amputation XGBoost showed best performance AUC: 0.820 Balanced-accuracy: 74.9% F1 score: 60.0% Minor amputation XGBoost showed best performance AUC: 0.637 Balanced-accuracy: 60.1% F1 score: 46.8% Any LEA Gradient boosted trees showed best performance AUC: 0.756 Balanced-accuracy: 68.4% F1 score: 67.4%

Abbreviations: ML, machine learning; BPNN, back propagation neural network; AUC, area under curve; NPV, negative predictive value; PPV, positive predictive value; CTREE, conditional inference trees; CI, confidence interval; LEA, lower extremity amputation.

Developed models are explainable due to the application of Explainable AI (XAI) techniques or the nature of algorithm itself (e.g., Visualization of Decision Tree).

This study aims to use ML algorithms to develop predictive models capable of predicting the risk of LEA in DFU with good performance. This study used a large repertoire of ML algorithms, evaluating and comparing their performance. The larger sample size and greater variety of features collected will provide assurance in a greater degree of generalizability that the model can be applied and implemented on. For comprehensiveness, this study also developed 3 ML models to examine the key risk factors of each amputation outcomes (major, minor, and any LEA), respectively. Finally, a model-agnostic explainability method will also be employed to interpret the output of the ML model, which may potentially generate new insights or factors that contribute to the risk of diabetic foot amputation. The interpretability of the model will improve the understanding of the predictions and increase the uptake of the implementation of ML models to aid clinical decisions.

Methods

Data Collection and Study Design

A comprehensive data set was retrospectively collected from electronic medical records from 2012 to 2017. The data set included 5043 in-hospital episodes of 2522 unique patients admitted for DFU in Tan Tock Seng Hospital, Singapore. Episodes with patients who are below 18 years old and have missing wound characteristics are excluded from the study. The data set collected features such as demographics, wound characteristics, laboratory results, medication history, and comorbidities, which are also identified using the ICD-10-AM diagnosis codes.²⁸ The list of ICD codes used to identify patients with DFU, and their respective comorbidities can be found in Supplemental Appendix A. Wound characteristics data were extracted from the institution wound-specific electronic medical records for inpatient wounds as discrete data elements. The list of wound characteristics can be found in Supplemental Appendix B.

Episodes with patients who experienced an amputation event within 180 days after admission date is classified as the case group. Episodes with patients who did not experience an amputation event within 180 days is classified as the control group. For the control group, we excluded episodes with patients that died within 180 days from admission date.

There were three different amputation outcomes collected: (1) major amputation, (2) minor amputation, and (3) either major or minor amputation (any LEA). Major amputation is defined as any amputation above the ankle while minor amputation is defined as any amputation below the ankle.

A total of three ML models are developed to predict the risk of amputation in each outcome with the objective of predict the risk of an amputation event within 180 days from the patient’s admission date. The participation flow diagram for each outcome is shown below in Figures 1 to 3.

Figure 1.

Participation flow diagram for (1) major amputation.

Figure 2.

Participation flow diagram for (2) minor amputation.

Figure 3.

Participation flow diagram for (3) any lower extremity amputation.

Statistical Analysis

Descriptive analysis was performed on all three different scenarios of amputation outcomes. Continuous features were first tested for normality using Shapiro-Wilk’s test.²⁹ Features that are normally distributed are presented as mean ± standard deviation, while those that are nonnormally distributed are presented as median with interquartile range. For testing of statistical significance between the case and control group, t-test was employed for normally distributed features³⁰ and Mann–Whitney U test was employed for nonnormally distributed features.³¹

Categorical variables are expressed as counts (n) with percentages (%), Pearson Chi-square test was employed to evaluate the significance between the case and control group.³² A p value of <.05 was considered statistically significant. Statistical analysis of the baseline characteristics between patient cohorts can be found in the Supplemental Appendix C. All statistical analysis was performed using the statsmodels (Version 0.14.0) and SciPy (Version 1.11.1) packages from Python.

Data Preprocessing and Engineering

Features with more than 20% missing data were excluded and not used for modeling. For the remaining features with less than 20% missing data, it was assumed that they are missing at random and imputed using multiple imputation chained equations technique.³³

The data set is split to a “train set” and “test set” with a ratio of 70:30. The trainset is used for model development while the test set is held out and used to evaluate the model performance. For the continuous features, standardization is done using the data from train set and the computed mean and standard deviation is then applied to transform the data in test set. Categorical features are treated using one-hot encoding. After applying the above methods to the data set, 51 features were used for modeling. A list of features used for modeling can be found in Supplemental Appendix B.

Model Development

A total of 8 ML algorithms were applied to train the model, each algorithm is briefly described in Table 2.

Table 2.

List of ML Algorithms Applied.

ML algorithm	Description
Logistic regression (with LASSO)³⁴	• Models the relationship between the features and a binary outcome by estimating the probability of the outcome belonging to a particular class. • Might incorporate regularization techniques such as “Least Absolute Shrinkage and Selection Operator” (LASSO) to mitigate overfitting.
Support vector machine (SVM)³⁵	• Find an optimal hyperplane that maximize the margin between classes while minimizing classification errors. • Variants of SVM include the use of different kernels such as radial basis functions to handle non-linear data distributions.
Decision tree³⁶	• Creates a tree-like model of decisions by recursively splitting the data based on the values of the features, until it reaches the leaf node that corresponds to the respective class.
Random forest³⁷	• Combines multiple decision tree to make predictions, by using an ensemble learning method called bagging,³⁸ where each tree is trained on a random subset of the training data with replacement. • Reduces the likelihood of overfitting as each tree focuses on different subsets of the data, leading to a more generalized model.
Gradient boosted trees³⁹	• Used another branch of ensemble learning method called boosting, which combines multiple weak decision trees to create better classifiers.⁴⁰ • Boosting is a sequential and iterative process of training models to “correct” the errors made by previous models, by focusing more on the misclassified predictions from previous iterations.
XGBoost⁴¹	• A variant of Gradient Boosted Trees that introduces a regularization element to prevent overfitting.
AdaBoost⁴²	• A variant of Gradient Boosted Trees that assigns weights to training examples, emphasizing misclassified instances in subsequent iterations.
CatBoost⁴³	• A variant of Gradient Boosted Trees that grows symmetric trees, which controls model complexity.

Abbreviations: ML, Machine Learning; LASSO, Least Absolute Shrinkage and Selection Operator; SVM, Support Vector Machine.

The ML model is trained using the “train set,” and its hyperparameters are tuned using Bayesian Optimization,^44,45 to find the optimal hyperparameters that gives the best model performance. Each set of hyperparameters was cross-validated and evaluated using Repeated Stratified five-fold with six repeats. Bayesian optimization uses the results from past iterations to iteratively explore the parameter space, update the model based on observed evaluations, and make informed decisions about the next set of parameters to evaluate. The process stops when it reaches a maximum number of iterations or the early stopping criteria is reached.

Model Evaluation

The performance of the model is evaluated using the following evaluation metrics: Area under the receiver operating characteristic curve (AUROC), F1-score and balanced-accuracy. Area under the receiver operating characteristic curve represents the degree of separability between the two classes,⁴⁶ accuracy measures how often the model correctly predicts its classes, and F1 score is a balance of both precision and sensitivity.⁴⁷

As compared to accuracy, which may be biased toward the majority class, balanced-accuracy calculates the average accuracy of each class. This makes balanced-accuracy a robust metric for imbalanced class problem,⁴⁸ which is appropriate to evaluate the major amputation and minor amputation model (26.7% and 27.4% incidence, respectively). The equation below shows the formula for balanced-accuracy.

B a l a n c e d - a c c u r a c y = \frac{S e n s i t i v i t y + S p e c i f i c i t y}{2}

The model with the best set of evaluation metrics is then selected.

Model Explanation

SHapley Additive exPlanation (SHAP) algorithm, being one of the popular model-agnostic methods for model explanations, is applied to interpret the model results. It uses the concept of “cooperative game theory” to understand the contribution of each feature to the final prediction. SHapley Additive exPlanation estimates the average marginal contribution of each feature, when is used as a form of feature importance.⁴⁹

Packages Used

Python 3 programming language was used for both statistical analysis and modeling,^{41,43,45,49-55} a list of packages used, and its functions is found in Supplemental Appendix D.

Results

Model Performance

The best trained model using data from the train set is reported with mean and 95% confidence interval. The model with the best performance in each ML algorithm is then evaluated using the test set.

The best performing model for predicting major amputation is XGBoost, with an AUROC and balanced-accuracy of 0.820 and 0.749, respectively. This showed that the model has demonstrated good performance in discriminating between the two classes, and its performance is also consistent in the prediction of the minority class. As for the model for any LEA, Gradient Boosted Trees performed the best with an AUROC of 0.756 and balanced-accuracy of 0.684.

However, the minor amputation model showed poorer performance as compared with the other two models, with XGBoost as the best performing algorithm with an AUROC and balanced-accuracy at 0.637 and 0.601, respectively. The results of the model performance of the train set are shown in Figures 4 to 6. Table 3 to 5 shows the performance of the model applied on the test set. Detailed values of the model performance can be found in Supplemental Appendix E.

Figure 4.

Model performance on the train set of (1) major amputation model.

Figure 5.

Model performance on the train set of (2) minor amputation model.

Figure 6.

Model performance on the train set of (3) lower extremity amputation model.

Table 3.

Model Results for the (1) Major Amputation Model.

ML algorithm	Model performance on test set
ML algorithm	Balanced accuracy	AUROC	F1 score
Logistic regression	0.720	0.798	0.559
Support vector machines	0.660	0.801	0.489
Decision tree	0.643	0.664	0.487
Random forest	0.748	0.801	0.567
Gradient boosted trees	0.659	0.810	0.634
XGBoost	0.749	0.820	0.600
CatBoost	0.728	0.804	0.571
AdaBoost	0.644	0.776	0.512

Abbreviations: ML, machine learning; AUROC = Area under the receiver operating characteristic curve.

Table 4.

Model Results for the (2) Minor Amputation Model.

ML algorithm	Model performance on test set
ML algorithm	Balanced accuracy	AUROC	F1 score
Logistic regression	0.616	0.613	0.455
Support vector machines	0.564	0.626	0.288
Decision tree	0.575	0.531	0.421
Random forest	0.603	0.610	0.449
Gradient boosted trees	0.547	0.617	0.262
XGBoost	0.601	0.637	0.468
CatBoost	0.599	0.632	0.471
AdaBoost	0.453	0.598	0.231

Abbreviations: ML, machine learning; AUROC = area under the receiver operating characteristic curve.

Table 5.

Model Results for the (3) Lower Extremity Amputation Model.

ML algorithm	Model performance on test set
ML algorithm	Balanced accuracy	AUROC	F1 score
Logistic regression	0.658	0.736	0.647
Support vector machines	0.653	0.702	0.625
Decision tree	0.687	0.660	0.675
Random forest	0.671	0.732	0.660
Gradient boosted trees	0.684	0.756	0.674
XGBoost	0.662	0.730	0.690
CatBoost	0.674	0.737	0.630
AdaBoost	0.660	0.724	0.648

Abbreviations: ML, machine learning; AUROC = area under the receiver-operating-characteristic curve.

Model Explainability by SHAP

Figures 7 to 9 below demonstrates the feature importance chart, calculated using SHAP values from the SHAP algorithm, displaying the top-12 features used by each model. A higher SHAP value indicates that the feature has a larger contribution to the model prediction. Figures 10 to 12 below depicts the beeswarm plot of the models. The beeswarm plot helps to understand the impact of individual features on model predictions.

Figure 7.

Feature importance (using SHAP values) of the (1) major amputation model.

Figure 8.

Feature importance (using SHAP values) of the (2) minor amputation model.

Figure 9.

Feature importance (using SHAP values) of the (3) any lower extremity amputation model.

Figure 10.

Beeswarm plot (using SHAP values) of the (1) major amputation model.

Figure 11.

Beeswarm plot (using SHAP values) of the (2) minor amputation model.

Figure 12.

Beeswarm Plot (using SHAP values) of the (3) any lower extremity amputation model.

Discussion

Model Performance

The major amputation prediction model demonstrated good performance and predictive ability, achieving an AUROC and balanced-accuracy of 0.820 and 0.749, respectively, using the XGBoost algorithm. This is consistent with the study by Stefanopoulos et al,²⁶ who demonstrated similar performance in predicting the risk of major amputation (AUROC: 0.84% and 77.8% accuracy). The difference in both studies lies in the timepoint of prediction; this study aims to predict the major amputation event within 180 days from admission whereas Stefanopoulos et al aims to predict a major amputation event during the hospitalization stay. The results from both studies complement each other in highlighting risk factors for predicting major amputation from clinical data.

The model performance for predicting minor amputation did not perform as well as the major amputation model, with an AUROC and balanced-accuracy at 0.637 and 0.601, respectively. Wang et al²⁴ specifically focused on using ML for prediction of minor amputations in patients with poor wound statuses (University of Texas Grade 3 and above), achieving a model performance with AUC 0.881. In comparison, our study did not have a strict inclusion criterion and included all patients with DFU regardless of its severity. Wang et al²⁴ used Synthetic Minority Oversampling TEchnique (SMOTE) to oversample for disparity in the initial data set, which can improve data imbalance but also lead to model overfitting especially when case numbers are small. In comparison, we did not use oversampling techniques but used multiple imputation chained equations (MICE) technique for missing data. Statistical analysis of our study cohort (found in Supplemental Appendix D) showed that the minor amputation and no minor amputation groups are not significantly different from each other, which can explain the difficulty for the model to discriminate between the minor amputation and no minor amputation groups; therefore, leading to poor model performance. As the poorer model performance is from lack of discrimination between the features of the two groups rather than imbalanced data set, oversampling techniques might not help overcome this limitation.

The clinical care model for DFUs in the institution uses an inpatient multidisciplinary care pathway across multiple disciplines (consisting vascular surgeons, orthopedic surgeons, endocrinologists, rehabilitation physicians, nurses, podiatrists, orthotists, physiotherapists, social workers, and case managers). While aimed to reduce variations in care, differences in practice patterns across different specialty teams might also lead to poor model discrimination. More studies, with large sample sizes and features included, can be conducted to improve model performance further.

Through the results of the model performance, ensemble learning ML models have proven to perform well in this context, as the best-performing models for all three scenarios are all boosting algorithms (XGBoost for major and minor amputation, Gradient Boosted Trees for LE amputation). Several other studies in this context have also found its success using ensemble learning models for prediction.^24,56 Machine learning methods have the advantage of evaluating all available information and features, as compared to traditional statistical approaches where variables are selected based on presumed association.⁵⁷

The supplement of balanced-accuracy as a model evaluation metric showed that the model has consistent performance in predicting both the case and control group with good accuracy, especially for the prediction of a major amputation, thus making it a reliable model to be implemented for prediction.

Features Picked up by SHAP for Explainability

Features used by the ML model and explained by SHAP have been identified as key factors that are associated with the risk of diabetic foot amputation. For the major amputation prediction model, white blood cell (WBC) count being the most important feature selected by the model is not unexpected as a higher WBC count is typically associated with infection.¹³ Likewise for Charlson Comorbidity Index (CCI) score, a patient with more comorbidities is likely to have more diabetic-associated complications, disease severity, and poorer limb salvage outcome, thus leading to a higher risk of major amputation.⁵⁸ Red blood cell (RBC) count is not a common predictor for diabetic foot amputation, although anemia (low RBC count) has been reported to be significantly associated with poor wound healing and amputation.⁵⁹ Antiplatelet use such as aspirin and adenosine diphosphate (ADP) receptor inhibitors (e.g. Plavix) has also been identified as an important feature, likely reflecting that patients have underlying coronary artery disease, cerebrovascular disease, or peripheral vascular disease.^60-62 Wound characteristics such as necrotic eschar, osteomyelitis, depth, width, and length have also been identified as important features in the different models, with the presence of necrotic eschar being identified as the third most important feature for the model predicting any LEA. This supports the finding in some studies that wound features are an important aspect that predict long-term wound healing.⁶³ These features mentioned above have also been highlighted as important risk factors of diabetic foot amputation by other clinical studies, which show coherence between the model output and clinical interpretation.

Interestingly, red cell distribution width (RDW) is highlighted as a top feature in the minor amputation prediction model and as the sixth most important feature for major amputation prediction model. It is typically used as a clue in identifying nutritional deficiencies (e.g., iron, vitamin B12, and folate) in anemia; however, some studies have also identified RDW to be independently associated with worse outcomes and complications in diabetes.⁶⁴ Higher RDW values have been shown to be independently associated with vascular complications in diabetes,⁶⁵ and this was observed within the Singapore population as well.⁶⁶ The exact pathophysiology of this is unknown but thought to be related to the increased inflammatory burden seen in diabetes. This highlights the importance of using large data sets in ML models which can help identify important features that may not be thought to be important and excluded in traditional statistical modeling.

The key features identified by SHAP are slightly different between the major amputation and minor amputation prediction models; therefore, suggesting that future treatment protocols to reduce the risk of a major or minor amputation should be tailored uniquely as well.

Limitations

The development of the prediction model is limited to the features that are used to train it. Although the features used to train the model is comprehensive, there may still be other features that are useful predictors to diabetic foot amputation that are not found in this study. Examples include the patient’s nutrition levels⁶⁷ and radiological imaging of the wound. Wound staging systems such as the Wagner classification and Wound, Ischemia, foot Infection (WIfI) scores were also not in routine utilization during the time period studied and thus also not included in the study. As the treatment of DFU is a multidisciplinary effort, including data from other clinical disciplines or areas related to the management and treatment of DFU (such as podiatric and orthotic interventions) might produce a more holistic model.

Owing to the retrospective nature of the study, missing data are inevitable and is a limitation in our study. Inpatient episodes with missing wound characteristic data have been excluded. Although the differential missingness between the case and control group is similar, this can lead to bias. The absence of such data and the workaround of using imputed values may potentially cause bias in the modeling. Moreover, the features that are used for modeling in this study is collected at a single time point, which is during patient’s admission. The features will be more informative if they can be collected at multiple time points and at different care settings such as in outpatient settings and in step-down care. The inclusion of such features may also be informative to the model for better prediction performance.

Future Work

Future expansion of the study can include the collection of more data at multiple time points to track the trajectory of the risk of amputation. This can include prospective studies which can also be used to validate the model and its accuracy, as practice changes and clinical outcomes may change over time. The study could also incorporate of other forms of data such as wound images and radiological imaging characteristics to improve the performance of the prediction model. Including wound staging systems such as the WIfI score or Wagner classification and evaluating the additive value of ML models on top of existing staging systems would also be helpful. The complex nature of training a predictive model to ingest different modes of data relies on the need for deep learning methods such as multimodal deep learning, which can also be considered in the future.^68,69 Integration of the model into the institution electronic medical record system as a risk predictor for amputation can also be considered and help provide personalized care.

Conclusion

In summary, this study applied a wide variety of ML algorithms on a comprehensive data set with a large sample size to predict the risk of LEA in patient episodes with DFU within 180 days from admission date. This study developed three ML models to understand the key factors of each amputation outcomes (major, minor, and any LEA).

Machine learning techniques are shown be effective and have good predictability in predicting the risk of a DFU patient experiencing amputation, especially with the prediction of major amputation, with an AUROC of 0.820. The utilization of balanced-accuracy metric has showed that the model is able to produce consistent accuracy in the prediction of both classes, making it a reliable model for implementation. With the adoption of explainable modeling such as SHAP, it provides insights into the importance of each feature in contributing to the model’s output and makes the deployment of models “transparent” and more feasible. Future work can include additional features and prospective data to enhance the prediction model to be a more comprehensive and holistic one.

Supplemental Material

sj-docx-1-dst-10.1177_19322968241228606 – Supplemental material for Risk Prediction of Diabetic Foot Amputation Using Machine Learning and Explainable Artificial Intelligence

Supplemental material, sj-docx-1-dst-10.1177_19322968241228606 for Risk Prediction of Diabetic Foot Amputation Using Machine Learning and Explainable Artificial Intelligence by Chien Wei Oei, Yam Meng Chan, Xiaojin Zhang, Kee Hao Leo, Enming Yong, Rhan Chaen Chong, Qiantai Hong, Li Zhang, Ying Pan, Glenn Wei Leong Tan and Malcolm Han Wen Mak in Journal of Diabetes Science and Technology

Footnotes

Abbreviations

AUROC, area under the receiver operating characteristic curve; CCI, Charlson Comorbidity Index; DFU, diabetic foot ulcers; LEA, lower extremity amputations; MICE, multiple imputation chained equations; ML, machine learning; ROC, receiver operating characteristic; SHAP, SHapley Additive exPlanations; SMOTE, synthetic minority oversampling technique.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Chien Wei Oei

Supplemental Material

Supplemental material is available in the online version of the article.

References

Surendra

Saxena

Car

Clinical and economic burden of diabetic foot ulcers: a 5-year longitudinal multi-ethnic cohort study from the tropics. Int Wound J. 2021;18(3):375-386. doi:10.1111/iwj.13540.

Hingorani

LaMuraglia

Henke

, et al The management of diabetic foot: a clinical practice guideline by the Society for Vascular Surgery in collaboration with the American Podiatric Medical Association and the Society for Vascular Medicine. J Vasc Surg. 2016;63(suppl 2):3S-21S. doi:10.1016/j.jvs.2015.10.003.

Riandini

Pang

Toh

MPHS

, et al National rates of lower extremity amputation in people with and without diabetes in a multi-ethnic Asian population: a ten year study in Singapore. Eur J Vasc Endovasc Surg. 2022;63(1):147-155. doi:10.1016/j.ejvs.2021.09.041.

Del Core

Ahn

Lewis

Raspovic

Lalli

TAJ

Wukich

DK.

The evaluation and treatment of diabetic foot ulcers and diabetic foot infections. Foot Ankle Orthop. 2018;3(3):2473011418788864. doi:10.1177/2473011418788864.

Malhotra

Chan

Nather

Osteomyelitis in the diabetic foot. Diabet Foot Ankle. 2014;5:10.3402/dfa.v5.24445. doi:10.3402/dfa.v5.24445.

Monami

Ragghianti

Nreu

, et al Major Amputation in non-healing ulcers: outcomes and economic issues. Data from a cohort of patients with diabetic foot ulcers. Int J Low Extrem Wounds. doi:10.1177/15347346221097283.

Everett

Mathioudakis

Update on management of diabetic foot ulcers. Ann N Y Acad Sci. 2018;1411(1):153-165. doi:10.1111/nyas.13569.

Kota

Meher

Sahoo

Mohapatra

Modi

KD.

Surgical revascularization techniques for diabetic foot. J Cardiovasc Dis Res. 2013;4(2):79-83. doi:10.1016/j.jcdr.2012.10.002.

Meltzer

Pels

Payne

, et al Decreasing amputation rates in patients with diabetes mellitus. An outcome study. J Am Podiatr Med Assoc. 2002;92(8):425-428. doi:10.7547/87507315-92-8-425.

10.

Armstrong

Cohen

Courric

Bharara

Marston

Diabetic foot ulcers and vascular insufficiency: our population has changed, but our methods have not. J Diabetes Sci Technol. 2011;5(6):1591-1595. doi:10.1177/193229681100500636.

11.

Jiang

Yuan

, et al Limb salvage and prevention of ulcer recurrence in a chronic refractory diabetic foot osteomyelitis. Diabetes Metab Syndr Obes. 2020;13:2289-2296. doi:10.2147/DMSO.S254586.

12.

Maciejewski

Reiber

Smith

Wallace

Hayes

Boyko

EJ.

Effectiveness of diabetic therapeutic footwear in preventing reulceration. Diabetes Care. 2004;27(7):1774-1782. doi:10.2337/diacare.27.7.1774.

13.

Won

Chung

Park

, et al Risk factors associated with amputation-free survival in patient with diabetic foot ulcers. Yonsei Med J. 2014;55(5):1373-1378. doi:10.3349/ymj.2014.55.5.1373.

14.

Hüsers

Hafer

Heggemann

Wiemeyer

John

Hübner

Predicting the amputation risk for patients with diabetic foot ulceration: a Bayesian decision support tool. BMC Med Inform Decis Mak. 2020;20(1):200. doi:10.1186/s12911-020-01195-x.

15.

Wang

Wei

Wang

Risk factors for major amputation in diabetic foot ulcer patients. Diabetes Metab Syndr Obes. 2021;14:2019-2027. doi:10.2147/DMSO.S307815.

16.

Basu

Manning

Mullahy

Comparing alternative models: log vs Cox proportional hazard. Health Econ. 2004;13(8):749-765. doi:10.1002/hec.852.

17.

Premsagar

Aldous

Esterhuizen

Gomes

Gaskell

Tabb

DL.

Comparing conventional statistical models and machine learning in a small cohort of South African cardiac patients. Inform Med Unlocked. 2022;34:101103. doi:10.1016/j.imu.2022.101103.

18.

Desai

Wang

Vaduganathan

Evers

Schneeweiss

Comparison of machine learning methods with traditional models for use of administrative claims with electronic medical records to predict heart failure outcomes. JAMA Netw Open. 2020;3(1):e1918962. doi:10.1001/jamanetworkopen.2019.18962.

19.

Gao

Cheng

Suganthan

Yuen

KF.

Inpatient discharges forecasting for Singapore hospitals by machine learning. IEEE J Biomed Health Inform. 2022;26(10):4966-4975. doi:10.1109/JBHI.2022.3172956.

20.

Davis

Zhang

Lee

, et al Effective hospital readmission prediction models using machine-learned features. BMC Health Serv Res. 2022;22(1):1415. doi:10.1186/s12913-022-08748-y.

21.

Nithya

Ilango

Predictive analytics in health care using machine learning tools and techniques. International Conference on Intelligent Computing and Control Systems (ICICCS); June 15-16, 2017; Madurai, India. doi:10.1109/ICCONS.2017.8250771.

22.

Huang

Yeung

Armstrong

, et al Artificial intelligence for predicting and diagnosing complications of diabetes. J Diabetes Sci Technol. 2023;17(1):224-238. doi:10.1177/19322968221124583.

23.

Lin

Yuan

Yang

Yin

Lin

The amputation and survival of patients with diabetic foot based on establishment of prediction model. Saudi J Biol Sci. 2020;27(3):853-858. doi:10.1016/j.sjbs.2019.12.020.

24.

Wang

Zhu

Tan

Machine learning for the prediction of minor amputation in University of Texas grade 3 diabetic foot ulcers. PLoS ONE. 2022;17(12):e0278445. doi:10.1371/journal.pone.0278445.

25.

Xie

Deng

, et al An explainable machine learning model for predicting in-hospital amputation rate of patients with diabetic foot ulcer. Int Wound J. 2022;19(4):910-918. doi:10.1111/iwj.13691.

26.

Stefanopoulos

Qiu

Ren

, et al A machine learning model for prediction of amputation in diabetics. J Diabetes Sci Technol. doi:10.1177/19322968221142899.

27.

Bharati

Mondal

MRH

Podder

A review on explainable artificial intelligence for healthcare: why, how, and when?

IEEE Trans Artif Intell. 2023:1-15. doi:10.1109/TAI.2023.3266418.

28.

Quan

Sundararajan

Halfon

, et al Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130-1139. doi:10.1097/01.mlr.0000182534.19832.83.

29.

Shapiro

Wilk

MB.

An analysis of variance test for normality (complete samples). Biometrika. 1965;52(3/4):591-611. doi:10.2307/2333709.

30.

Mishra

Singh

Pandey

Mishra

Pandey

Application of student’s t-test, analysis of variance, and covariance. Ann Card Anaesth. 2019;22(4):407-411. doi:10.4103/aca.ACA_94_19.

31.

Mann

Whitney

DR.

On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947;18:50-60. https://www.jstor.org/stable/2236101. Accessed June 22, 2023.

32.

Pearson

K. X.

On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond Edinb Dublin Philos Mag J Sci. 1900;50(302):157-175. doi:10.1080/14786440009463897.

33.

Jakobsen

Gluud

Wetterslev

Winkel

When and how should multiple imputation be used for handling missing data in randomised clinical trials: a practical guide with flowcharts. BMC Med Res Methodol. 2017;17(1):162. doi:10.1186/s12874-017-0442-1.

34.

Tibshirani

Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol. 1996;58(1):267-288.

35.

Cortes

Vapnik

Support-vector networks. Mach Learn. 1995;20(3):273-297. doi:10.1007/BF00994018.

36.

Fürnkranz

Decision Tree. In: Sammut

Webb

, eds. Encyclopedia of Machine Learning, Boston, MA: Springer;2010:263-267. doi:10.1007/978-0-387-30164-8_204.

37.

Breiman

Random forests. Mach Learn. 2001;45(1):5-32. doi:10.1023/A:1010933404324.

38.

Breiman

Bagging predictors. Mach Learn. 1996;24(2):123-140. doi:10.1007/BF00058655.

39.

Bühlmann

Hothorn

Boosting algorithms: regularization, prediction and model fitting. Stat Sci. 2007;22(4):477-505. doi:10.1214/07-STS242.

40.

Friedman

JH.

Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189-1232.

41.

Chen

Guestrin

XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; August 13-17, 2016; San Francisco, CA. doi:10.1145/2939672.2939785.

42.

Freund

Schapire

RE.

A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55(1):119-139. doi:10.1006/jcss.1997.1504.

43.

Dorogush

Ershov

Gulin

CatBoost: gradient boosting with categorical features support. arXiv, 2018. doi:10.48550/arXiv.1810.11363.

44.

Martinez-Cantin

BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits. arXiv, 2014. doi:10.48550/arXiv.1405.7430.

45.

Snoek

Larochelle

Adams

RP.

Practical Bayesian optimization of machine learning algorithms. arXiv, 2012. doi:10.48550/arXiv.1206.2944.

46.

Bradley

AP.

The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997;30(7):1145-1159. doi:10.1016/S0031-3203(96)00142-2.

47.

Chinchor

MUC-4 evaluation metrics. Proceedings of the Fourth Message Understanding Conference (MUC-4); June 16-18, 1992; McLean, VA. https://aclanthology.org/M92-1002. Accessed June 22, 2023.

48.

Brodersen

Ong

Stephan

Buhmann

JM.

The balanced accuracy and its posterior distribution. 20th International Conference on Pattern Recognition; August 23-26, 2010; Istanbul, Turkey. doi:10.1109/ICPR.2010.764.

49.

Lundberg

Lee

S-I.

A unified approach to interpreting model predictions. arXiv, 2017. doi:10.48550/arXiv.1705.07874.

50.

McKinney

pandas: a foundational python library for data analysis and statistics. Python High Performance Science Computer, 2011. https://dlr.de/sc/portaldata/15/resources/dokumente/pyhpc2011/submissions/pyhpc2011_submission_9.pdf

51.

Harris

Millman

van der Walt

, et al Array programming with NumPy. Nature. 2020;585:357-362. https://www.nature.com/articles/s41586-020-2649-2. Accessed June 20, 2023.

52.

Virtanen

Gommers

Oliphant

, et al SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Method. 2020;17:261-272. https://www.nature.com/articles/s41592-019-0686-2. Accessed June 20, 2023.

53.

Seabold

Perktold

Statsmodels: econometric and statistical modeling with python. Presented at the Python in Science Conference, June 28-July 3, 2010; Austin, TX. doi:10.25080/Majora-92bf1922-011.

54.

Pedregosa

Varoquaux

Gramfort

, et al Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12(85):2825-2830.

55.

Droettboom

Hunter

Firing

, et al Matplotlib version 1.4.0. Zenodo, 2014. doi:10.5281/zenodo.11451.

56.

Xie

Zhang

, et al The amputation and mortality of inpatients with diabetic foot ulceration in the COVID-19 pandemic and postpandemic era: a machine learning study. Int Wound J. 2022;19(6):1289-1297. doi:10.1111/iwj.13723.

57.

Rajula

HSR

Verlato

Manchia

Antonucci

Fanos

. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina J. 2020;56:455. doi:10.3390/medicina56090455.

58.

Gurney

Stanley

York

Rosenbaum

Sarfati

Risk of lower limb amputation in a national prevalent cohort of patients with diabetes. Diabetologia. 2018;61(3):626-635. doi:10.1007/s00125-017-4488-8.

59.

Gezawa

Ugwu

Ezeani

Adeleye

Okpe

Enamino

Anemia in patients with diabetic foot ulcer and its impact on disease outcome among Nigerians: results from the MEDFUN study. PLoS ONE. 2019;14(12):e0226226. doi:10.1371/journal.pone.0226226.

60.

Mavrogenis

Megaloikonomos

Antoniadou

, et al Current concepts for the evaluation and management of diabetic foot ulcers. EFORT Open Rev. 2018;3(9):513-525. doi:10.1302/2058-5241.3.180010.

61.

Schrör

Aspirin and platelets: the antiplatelet action of aspirin and its role in thrombosis treatment and prophylaxis. Semin Thromb Hemost. 1997;23(4):349-356. doi:10.1055/s-2007-996108.

62.

Nativel

Potier

Alexandre

, et al Lower extremity arterial disease in patients with diabetes: a contemporary narrative review. Cardiovasc Diabetol. 2018;17(1):138. doi:10.1186/s12933-018-0781-1.

63.

Cho

Mattke

Gordon

Sheridan

Ennis

Development of a Model to predict healing of chronic wounds within 12 weeks. Adv Wound Care. 2020;9(9):516-524. doi:10.1089/wound.2019.1091.

64.

Xiong

Yang

Chen

, et al Red cell distribution width as a significant indicator of medication and prognosis in type 2 diabetic patients. Sci Rep. 2017;7(1):2709. doi:10.1038/s41598-017-02904-9.

65.

Malandrino

Taveira

Whitlatch

Smith

RJ.

Association between red blood cell distribution width and macrovascular and microvascular complications in diabetes. Diabetologia. 2012;55(1):226-235. doi:10.1007/s00125-011-2331-1.

66.

Jie Chee

Seneviratna

Joo Lim

, et al Red cell distribution width is associated with mortality and cardiovascular complications in diabetes mellitus in Singapore. Eur J Prev Cardiol. 2020;27(2):216-219. doi:10.1177/2047487319836854.

67.

Da Porto

Miranda

Brosolo

Zanette

Michelli

Ros

. Nutritional supplementation on wound healing in diabetic foot: what is known and what is new? World J Diabetes. 2022;13(11):940-948. doi:10.4239/wjd.v13.i11.940.

68.

Ngiam

Khor

IW.

Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019;20(5):e262-e273. doi:10.1016/s1470-2045(19)30149-4.

69.

Stahlschmidt

Ulfenborg

Synnergren

Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform. 2022;23(2):bbab569. doi:10.1093/bib/bbab569.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.07 MB