Abstract
Background:
Diabetic foot ulcer (DFU) and the resulting lower extremity amputation are associated with a poor survival prognosis. The objective of this study is to generate a model for predicting the probability of major amputation in hospitalized patients with DFU.
Methods:
The National Inpatient Sample (NIS) database from 2008 to 2014 was used to select patients with DFU, who were then further divided by major amputation status.
Results:
A total of 326 853 inpatients with DFU were identified, and 5.9% underwent major amputation. The top five contributory variables (all with
Conclusion:
Utilizing machine learning methods, we have developed a clinical algorithm that predicts the risk of major lower extremity amputation for inpatients with diabetes with 77.8% accuracy.
Keywords
Introduction
Diabetes and its associated complications pose a daunting and rising health burden in the United States. About 30.3 million (9.4%) of Americans were affected by the disease in 2015 with an additional 1.5 million adults diagnosed annually. 1 The financial costs of diabetes have also been steadily increasing. From 2012 to 2017, the total estimated cost of diabetes increased from $245 billion to $327 billion with a $61 billion increase incurred in direct medical costs and a $21 billion increase in lost productivity. 2 Among patients with diabetes, diabetic foot ulcer (DFU) is one of the leading causes for hospitalization and precedes about 80% to 85% of lower extremity amputations.3,4 The lifetime incidence of developing a foot ulcer in patients with diabetes has been estimated to be between 15% and 25%.5 -8 Moreover, a DFU diagnosis is associated with a poor survival prognosis; the five-year mortality rate has been reported to be between 43% and 55% for patients with DFU and up to 74% for those with lower extremity amputations.9 -11 In addition to DFU’s considerable impact on mortality, DFU is also associated with devastating financial, emotional, and psychological burden.11 -14
The profound impact that DFUs have on patients and society underscores the need to develop more effective diagnosis and treatment tools. In 2015, the Society for Vascular Surgery addressed this need by proposing lower extremity wound classification system known as the Wound, Ischemia, and foot Infection (WIfI). 15 Some studies have since demonstrated the effectiveness of this classification system at predicting wound healing in patients with DFU and amputation in patients with lower extremity ischemia.16 -18 However, others have shown that the WIfI classification system is not predictive of major amputation specifically in patients with DFU, a high-risk group for lower extremity amputations. 16 Furthermore, the prognostic value of the WIfI system for amputation among patients with DFU is intrinsically limited because it the system is built through expert consensus and not from clinical data.
Recent advances in machine learning techniques offer a new opportunity to generate prediction algorithms from large set of clinical data. 19 Such algorithms have already been put into use to predict a variety of clinical outcomes such as sepsis, dementia, readmission, and mortality post-chemotherapy.20 -23 However, the prognostic potential of machine learning techniques, such as deep learning, have not been explored regarding the inpatient population with DFU. Thus, we aim to apply deep learning techniques to analyze a large inpatient database and identify the risk factors that predict major lower extremity amputations in patients with DFU. These factors would then be incorporated into a prognostic model that could aid in the early identification of high-risk patients.
Methods
Data Collection
We utilized the Nation Inpatient Sample (NIS) of the Healthcare Cost and Utilization Project (HCUP) to select for patients who were admitted with a diagnosis of DFU between 2008 and 2014.
Data Cleaning
We focused on analyzing major lower extremity amputation in hospitalized adult patients with DFU. Records with key missing values were excluded from analysis. A total of 326 853 patients with DFU, age 18 years and older, who did not expire during their inpatient admission were selected as case candidates. A combination of literature review and clinician expertise generated a list of candidate variables with potential correlation to amputation along with their corresponding
Statistical Analysis
All analyses and figures were prepared using Statistical Package for the Social Sciences (SPSS) and RStudio. Specifically, multiple R packages were utilized in this study, including “foreign,” “dplyr,” “sqldf,” “party,” “partykit,” “grid,” “mvtnorm,” “modeltools,” “stats4,” “strucchange,” “randomForest,” “glmnet,” “cowplot,” and “ggplot2.” For descriptive statistics, the
Calculation
The data set was randomly divided by randomly assigning 70% to training and 30% to testing data sets. As the major amputation cases are highly imbalanced within the NIS database, a decision tree model (CTREE) was performed. CTREE is an unbiased recursive partitioning method with a permutation test performed at each pruning step. The nodes are split when the
Results
We identified a total of 326 853 patients in the NIS database between 2008 and 2014 who were admitted with DFU. 5.9% of patients with DFU underwent major amputation. The majority of patients within the study were White (63.3%), followed by Black (18.9%) and Hispanic (12.8%). Medicare was the largest payer for patients with DFU accounting for 59.1% of patients, followed by Private Insurance (19.1%), Medicaid (13.7%), and self-pay (5.0%). Patients who underwent major amputations were older (mean age 64.0 ± 13.2 years old,
Demographics.
Abbreviations: SD, standard deviation; BMI, body mass index; USD, United States Dollar.
Charges have been adjusted to consumer price of 2019 by the consumer price index inpatient hospital services annual rate.
Selected possible contributing factors were accumulated from previous publications and expert opinions. The individual odds ratios (ORs) were calculated for all the binary factors and presented in Table 2. Gangrene was the single most associated factor with risk of major amputation (OR = 11.8,
Odds Ratio of Binary Variables.
Abbreviations: OR, odds ratio; CI, confidence interval.
The top five contributing variables to the predictive model (all with
Top Five and Top 10 Model Performance.
Abbreviations: AUC, area under the curve; NA, not applicable.
The AUC was almost the same at 0.84 for both models, boosted and nonboosted. The resulting receiver operating characteristics curve of the unboosted five-variable model is presented in Figure 1.

Model performance receiver operating characteristics curve.
Receiver operating characteristic curve using the five-factor model
The decision tree model utilizing the top five most predicted factors is presented in Figure 2.

Decision tree model (CTREE).
The resulting CTREE model in Figure 2 has five levels of depth, 24 inner nodes, and 25 terminal nodes. The final model is an interactive scoring algorithm that generates a percent risk of amputation and also a risk multiplier of amputation when compared with a general patient admitted with DFU. The interactive model can be accessed through the link provided: https://grenut.shinyapps.io/amputation/
Discussion
Our model is able to predict major amputation with 77.8% accuracy and has an AUC of 0.84 (95% CI = 0.83-0.85) using only five variables: gangrene, osteomyelitis, peripheral vascular disease, systemic infection, and weight loss. Gangrene was the single most important symptom associated with undergoing major lower extremity amputation. With gangrene alone, and none of the other variables, a patient has a 232.0% increased risk of having a major amputation. A recent meta-analysis on the risk of lower extremity amputations in patients with DFU also identified gangrene as the highest risk factor with an OR of 9.9.
25
Moreover, three out of top five factors used in our model represent signs of infection, demonstrating that the presence of infections are powerful risk factors for amputation and provide another meaningful point of intervention for clinicians. The role of active infection in lower extremity amputation is well supported by literature.25
-27 Unsurprisingly, the presence of peripheral vascular disease (PVD) was associated with major amputation, which has also been identified in a multicenter observation study by Ugwu et al
26
Our model also found weight loss, defined by the AHRQ as
Although the application of machine learning to clinical medicine is in its infancy, many algorithms have been developed to predict clinical outcomes such as sepsis, dementia, readmission, and mortality postchemotherapy.20 -23 To the best of our knowledge, this study represents the first attempt to use machine learning techniques to produce a prediction model of lower extremity amputation for patients with DFU based on clinical data from a national database. No dedicated prediction model for the risk of major lower extremity amputation of inpatients with diabetes exists. Previous wound classification systems are the closest modality clinicians have for predicting the risk of amputation. The current wound classification systems are based on expert consensus regarding symptoms and severity of atherosclerotic disease instead of robust support from clinical data. Furthermore, among the array of classification systems associated with lower extremity amputation such as the Fontaine, Rutherford, and Bolinger, only the WIfI and Graziani classification systems were developed specifically for patients with diabetes.15,29,30 Although these classification systems categorize the severity of wounds, they have limited prognostic value. A recent meta-analysis consisting of 2669 patients with critical limb ischemia showed positive correlation between the WIfI stage and the risk an amputation after one year. However, when only patients with diabetes were evaluated, Mathioudakis et al 16 did not find WIfI to be prognostic for amputation.
Many machine learning prediction models are black boxes that do not offer an explanation for their results, which can lead unintended consequences for high-stake fields like criminal justice and healthcare. 31 Thus, the decision tree model, a well-known classification method, was selected for this study because of its ability to provide transparency and allows clinicians to interpret the result themselves. In addition, the decision tree allows analysis of data with a mixture of real-valued and categorical features as well as with data that may be missing some variables. Furthermore, the training of the decision tree model does not require compensating for imbalanced data samples by using techniques such as propensity score matching or oversampling. This characteristic made it ideal for the NIS database, which is an imbalanced binary data set. Thus, all the information within data set is retained without manipulations, and the resulting model is based on the overall sample probability and not a regression. Specifically, the CTREE model, a type of decision tree, was selected because it has been shown to reduce overfitting and selection bias toward covariates with many possible splits better than the traditional decision tree models (CHAID, ID3, C4.5, CART, and QUEST).32,33 Last, Boosting and random forest analysis were performed as a measure of internal validation of the data set. Whether using 10 variables, five variables with boosting, or a random forest analysis, the AUC remained the same.
This prediction algorithm is not only clinically relevant, but it is also easily accessible. The final five-variable model offers similar accuracy as the 10-variable model but has the added benefit of using a smaller number of predictive variables. By using less variables, the concise five-variable model allows for easier use in the clinical setting and reduces the chance of input errors. In addition, we have provided access to the model on the web. Any clinician can answer the five short questions for a particular patient in seconds and immediately receive the calculated probability of a major amputation for that patient as compared with the average inpatient with a DFU. Unfiltered access to the algorithm will also allow individual clinicians to verify the predicted risk with real clinical outcomes.
During this study, we encountered some of the inherent limitations of machine learning. The quality of the database determines the quality of the subsequent analysis. We utilized the NIS database in this study because it is currently the most comprehensive publicly accessible patient database. However, the NIS database is dependent on manual reporting of patient information and diagnoses, and thus can be biased by human errors regarding ICD code entry. Furthermore, the resulting model is limited to only including factors that have a code within the NIS database. Thus, we may be excluding variables that lack a specific code but that are potentially great risk factors for lower extremity amputation, such as anatomic depth of wounds. In addition, the NIS database only contains a snapshot of a single hospital admission; therefore, it may overemphasize the contribution of acute diseases process, such as infection, over chronic illness. This lack of long-term patient follow-up data limits the sensitivity, specificity, and overall accuracy of the resulting prediction model. Last, machine learning models could suffer from overfitting when they are trained using a single database. Given our model’s high AUC of 0.84, there was a concern of overfitting as a potential problem. We attempted to mitigate this issue by using the CTREE model and by comparing the performance of randomly divided training and testing data.
Another inherent limitation of machine learning is the Big Data Paradox. This puts forth that inaccurate conclusions of power and significance can be drawn due to both the large sample size (n) of big data and the loss of probabilistic sampling.
34
This is an inherent problem for all machine learning research. Dr Meng, editor of the
With the adaptation of electronic medical records and natural language processing, the amount of clinical data available is increasing at an exponential rate. The quantity of data being put forth provides an exciting opportunity to further our understanding of clinical outcomes and enact interventions that greatly improve the health of our society. We have presented a novel algorithm derived from machine learning that predicts lower extremity major amputation in hospitalized patients with diabetes with 77.8% accuracy based on a national clinical database. These findings represent an important step in transforming medical decision-making from being rooted in expert consensus to grounded in clinical data. The accuracy and applicability of machine learning in medicine will only improve as the quantity and quality of clinical data grow in the coming years.
Conclusion
Diabetic foot ulcer and the resulting lower extremity amputation are associated with considerable financial, emotional, and psychological burden in addition to poor survival prognosis. Thus, it is critical to identify high-risk groups and provide early intervention to reduce rates of complications. Utilizing machine learning methods, we have developed an algorithm based on an extensive national clinical database to predict risks of significant lower extremity amputation for inpatients with diabetes. This study represents the first attempt to leverage the power of machine learning in the treatment of vascular patients and highlight its potential to improve patient care.
Supplemental Material
sj-docx-1-dst-10.1177_19322968221142899 – Supplemental material for A Machine Learning Model for Prediction of Amputation in Diabetics
Supplemental material, sj-docx-1-dst-10.1177_19322968221142899 for A Machine Learning Model for Prediction of Amputation in Diabetics by Stavros Stefanopoulos, Qiong Qiu, Gang Ren, Ayman Ahmed, Mohamed Osman, F. Charles Brunicardi and Munier Nazzal in Journal of Diabetes Science and Technology
Footnotes
Abbreviations
AHRQ, Agency for Healthcare Research and Quality; AUC, area under the curve; DFU, diabetic foot ulcer; HCUP, The Healthcare Cost and Utilization Project;
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
