Abstract
Background
As ground-level falls (GLFs) are a significant cause of mortality in elderly patients, field triage plays an essential role in patient outcomes. This research investigates how machine learning algorithms can supplement traditional t-tests to recognize statistically significant patterns in medical data and to aid clinical guidelines.
Methods
This is a retrospective study using data from 715 GLF patients over 75 years old. We first calculated P-values for each recorded factor to determine the factor’s significance in contributing to a need for surgery (P < .05 is significant). We then utilized the XGBoost machine learning method to rank contributing factors. We applied SHapley Additive exPlanations (SHAP) values to interpret the feature importance and provide clinical guidance via decision trees.
Results
The three most significant P-values when comparing patients with and without surgery are as follows: Glasgow Coma Scale (GCS) (P < .001), no comorbidities (P < .001), and transfer-in (P = .019). The XGBoost algorithm determined that GCS and systolic blood pressure contribute most strongly. The prediction accuracy of these XGBoost results based on the test/train split was 90.3%.
Discussion
When compared to P-values, XGBoost provides more robust, detailed results regarding the factors that suggest a need for surgery. This demonstrates the clinical applicability of machine learning algorithms. Paramedics can use resulting decision trees to inform medical decision-making in real time. XGBoost’s generalizability power increases with more data and can be tuned to prospectively assist individual hospitals.
Key Takeaways
• The key indicators of GLF severity determined by the machine learning XGBoost model (GCS, systolic blood pressure, diastolic blood pressure) are >90% accurate and differ from the factors identified using t-tests (GCS, no comorbidities, and transfer-in status). • XGBoost may be a useful tool for augmenting the triage process with data that is easily available and already being recorded by first responders.
Introduction
Falls are one of the most common causes of traumatic injury in the United States and the leading cause of death in adults over 65 years. 1 Common risk factors for fall-related injuries or death, in addition to age over 65, include certain medical conditions such as diabetes mellitus and the use of medications such as anticoagulants and antiplatelets. 1 Kaiser et al report that up to 15% of annual emergency department visits are a result of fall-related injuries, and these injuries typically include traumatic brain injury, lacerations, or fracture(s). 2 Untreated fall injuries can be fatal in patients with the aforementioned risk factors. 2
Given the health burden and potentially permanent negative health outcomes associated with fall-related injuries, appropriate treatment of elderly fall victims is critical. 3 Appropriate treatment includes both appropriate destination and adequate pre-hospital care. More specifically, after arriving on the scene, emergency medical services (EMS) personnel must evaluate the extent of the patient’s injuries during field triage. The result of this triage is the assignment of a destination according to the severity and mechanism of the patient’s injuries. 4 As Level 1 trauma centers alone provide a 24-hour surgery department, proper field triage is critical to the survival of Ground Level Falls (GLF) victims with severe injuries. 5
Currently, EMS ubiquitously uses The American College of Surgeons and Centers for Disease Control (CDC) guidelines for field triage in the United States. 6 These guidelines are constantly under revision due to clinical evidence suggesting over-triage or under-triage in certain situations. 6 To construct and assess the guidelines, researchers have used traditional statistical methods for data analysis, such as t-tests and logistic regression models. 7
Machine learning may serve as an improved method of assessing and refining field triage guidelines. XGBoost is an evidence-based, state-of-the-art classification algorithm for tabular datasets. 8 In recent literature, Wang et al use XGBoost as an interpretable classification model for the prediction of patient outcomes in subarachnoid hemorrhage. 9 This confirms other research demonstrating that XGBoost significantly outperforms logistic regression and t-tests in predicting outcomes.8-11
This study explores the utility of XGBoost in classifying surgery outcomes of GLFs to aid in the field triage of patients over 75 years old. We find that the factors the XGBoost model deems most important to classifying outcomes are >90% accurate and differ from the factors identified using T-tests. We report our findings using feature importance graphs and binary decision trees to aid in clinical practice.
Methods
Institutional Review Board review and approval for this research is self-exempt based on the University of California, Irvine’s guidelines due to the use of retrospective data without identifiable patient information.
All GLF data were obtained retrospectively from trauma centers in Orange County, California, including the Level 1 trauma center UCI, and Level 2 trauma centers Orange County Global Medical Center (OC) and Providence Mission Hospital Mission Viejo (Mission). All patient data were reported and collected from these participating hospitals. Data from UCI ranged from 12/01/2016 to 09/30/2018, with Mission reporting all data from the years 2015, 2017, and 2018. OC reported data from 2016 through 2018. Reported indices include the patient’s age, sex, date of arrival, pre-existing conditions, vital signs, hospital length of stay, discharge disposition, and outcome (living or deceased). Data from a total of 715 patients were reported: each patient was over 75 years old and had experienced a ground-level fall.
Each patient in the study is given a label of patient outcome; “surgery” or “no surgery.” The “surgery” category is a grouping of all types of surgery, including head and orthopedic surgery. We identify differences in the feature distribution using a t test. Furthermore, we use XGBoost to identify patterns in the data to determine which reported factors contribute most to patient outcome. In our experiments, we aggregated each patient’s data and ran it through the patient outcome classifier model. All experiments were run using Python’s statistical software (Python Version 3.8, Python Software Foundation), using the XGBoost package. For any missing data, we used the Multiple Imputation with Chained Equations (MICE) package. T-tests were performed with the scipy statistics package, setting statistical significance to P < .05.
Results
The study included a total of 715 patients who were treated for injuries following a GLF. The study groups were found to be comparable concerning age, gender, and overall health status. The mean age for both groups was 85.7 years ± .22 SEM, with a nearly identical proportion of males in the surgery group (50.7% vs 49.2%). No significant differences were found in comorbidities between the groups. However, there were statistically significant differences in GCS scores (P < .001), with a mean of 12.4 ± .5 in the surgery group and 14.4 ± .05 in the non-surgery group. Transfer status also differed significantly between the groups, with a higher proportion of patients in the surgery group being transferred from another facility (P = .019).
To further analyze the data, we used both t-tests and XGBoost classification. The t-tests showed that GCS scores and transfer status were significant predictors of patient outcome. However, when using XGBoost analysis, we found that low GCS, low systolic blood pressure (SBP), low diastolic blood pressure (DBP), previously diagnosed hypertension, and high heart rate were all predictive of the need for surgery. The relative magnitudes of each factor’s contribution to a need for surgery are demonstrated in Figure 1. Transfer status was not deemed significant by the XGBoost algorithm. Feature importance in predicting the need for surgery, as determined by XGBoost analysis. GCS and systolic blood pressure were found to be the most important factors, while diastolic blood pressure, hypertension, pulse, and low systolic blood pressure were also found to be predictive of surgery. Transfer status was not deemed significant. “Field” values are those taken by EMS upon first interaction with the patient, while “Initial” values are recorded upon arrival to the emergency department.
We use 2 metrics to evaluate the quality of XGBoost’s performance: prediction accuracy and the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) metric. Prediction accuracy is defined as the ratio of the number of points classified correctly over the number of points tested. The AUC is a metric used to evaluate the performance of a binary classifier, particularly when the classes are imbalanced, or the cost of false positives and false negatives is unequal. AUC ranges from 0 to 1, with .5 indicating a random classifier and 1 indicating a perfect classifier. The ROC curve shows the tradeoff between the false positive rate and the true positive rate of a classifier. XGBoost achieved a test prediction accuracy of 90.3% with a Test AUC of .81 (see Figure 2), indicating good performance in predicting surgery requirements. The Receiver Operating Characteristic graph shows the tradeoff of the XGBOOST classifier of the False Positive Rate vs the True Positive Rate. We see that XGBOOST can predict surgery requirements, with a Test Area Under the Curve of .81.
The XGBoost model provides a feature-based binary decision tree, which can be seen in Figure 3. The tree (Figure 3) provides simple, clear guidelines for determining the likelihood of surgery according to GCS score and SBP. Inflection points are not pre-assigned to the algorithm, but rather determined based on patterns in data. Specifically, the data used yields the following clinical guidelines for GLF patients: • If GCS <8.5 or systolic blood pressure <99.5, classify as a poor outcome (surgery needed, treat at Level 1 Trauma Center) • If GCS >8.5 and systolic blood pressure ≥99.5, classify as a good outcome (surgery not needed) Binary decision tree for predicting the need for surgery based on XGBoost analysis. The tree provides simple, clear guidelines for determining the likelihood of surgery based on the patient’s GCS score, and systolic blood pressure.

Discussion
Given the prevalence of fall-related injury and mortality, particularly in the elderly, patients with such injuries require prompt and appropriate care. While surgery is one of many reasons a patient may be taken to a Level 1 trauma center, it serves as a representative proxy for triage decision-making. Thus, identifying factors indicative of the need for surgical intervention aids in optimizing patient care. In this retrospective analysis of elderly patients with fall injuries, we identified based on t-tests that the three most significant factors contributing to the need for surgery were GCS score, no comorbidities, and transfer-in status. Moreover, through the XGBoost algorithm, we identified that low GCS and low systolic blood pressure are the primary predictors of the need for surgical intervention. The XGBoost model had a prediction accuracy of 90.3% based on the test/train split, indicating its potential usefulness in the field triage process. Quick and accurate identification of which fall victims require surgical intervention not only promotes more rapid care for the critically injured but can also help to alleviate the medical burden associated with emergency room overcrowding.1,2
Our study adds to the understanding of the clinical problem of fall-related injuries in the elderly by demonstrating the potential value of machine learning in the field triage process. The use of XGBoost allowed us to identify factors that were not detected by traditional statistical methods and provided an interpretable decision tree that could be used to guide treatment decisions in clinical practice. The results obtained using XGBoost are more robust than t-tests because XGBoost accounts for multiple variable interactions, whereas t-tests only consider the mean difference between 2 groups. XGBoost achieves a prediction accuracy of 90.3% and an AUC of .81 in this case. Since t-tests provide a level of significance and not the ability to make predictions, they do not yield a comparative accuracy percentage: this further illustrates the merit of XGBoost over traditional methods. Additionally, XGBoost can manage imbalanced datasets and make predictions for new data, whereas t-tests are limited to a comparison between existing groups. This adaptiveness has particular utility in high-stress situations that require swift decision-making, as XGBoost can guide the management of a variety of conditions.
A lowered GCS score served as one of the strongest contributing factors for patients requiring surgery. While GCS scores have a limited predictive value when considered alone, in situations such as this in which we control for age and injury mechanism, studies have shown that GCS scores can be used for predicting adverse health outcomes.3,4 In fact, one 2021 study reported that just the pupil portion alone of the GCS examination can have predictive value in identifying if patients have an increased risk of mortality secondary to traumatic neurological injuries. 5 Moreover, the implementation of XGBoost and its supporting finding of the predictive capacity of GCS further confirms that GCS score should be considered in the triage of elderly GLF patients. Beyond GCS, XGBoost also found that abnormal vital signs including low systolic or diastolic blood pressure, previously diagnosed hypertension, and tachycardia, respectively, also all played a key role in determining which patients require surgery. XGBoost allows for the formulation of triage guidelines that include multiple variables that are most indicative of surgery; multivariate severity scores have been shown to improve the predictive value. 6
The presence of a sole Level 1 trauma center in Orange County, along with concerns of volume overload, 12 suggests there may be a need for improved triage guidelines. As mentioned, this remains especially true for falls, as it is an exceptionally common mechanism of injury in elderly populations. Thus, identifying which patients truly need level 1 care may alleviate the medical burden associated with such injuries. This study demonstrates that transfer status is one of the key predictors of outcome in the elderly population, which suggests the proper primary destination for GLF trauma patients. As lower-tier centers lack resources to care for more critical patients, the implementation of XGBoost programs could promote improved outcomes if patients are routed to centers according to the severity of their condition. Many of these studied variables lack predictive value on their own. However, through the use of machine learning programs, the identification of high-risk patients can be streamlined to optimize outcomes.
This study is not without limitations. Our study was conducted at a single center and may not be representative of other patient populations or clinical settings. The retrospective nature of this study also introduces inherent bias. A key feature of XGBoost is that it can tailor to community-specific data and grow in accuracy with the number of patients seen. This sample size was relatively small, which may have limited our ability to detect more subtle differences between the groups. It should also be noted that reports of patient injuries after falls in vulnerable populations, particularly elderly populations, are underreported. 13 Additionally, as the application of this method, has not been tested in the field yet, the feasibility of implementing this program is unclear, as some variables may not be identifiable in the immediate triage.
In conclusion, our study suggests that XGBoost is an effective tool for predicting the need for surgery in elderly fall victims and guiding treatment decisions in the field triage process. This tool can augment the triage process with data that is easily available and already being recorded by first responders. Machine learning can compare multiple groups based on features and can guide the optimal classification of patient outcomes.
As a topic of future research, we believe that given more feature-rich training data XGBoost can predict the length of stay, and mortality, and distinguish between different surgery types. Having the ability to make accurate predictions about more outcomes, will allow EMS to further augment their daily triage process with machine learning. As the algorithm processes higher volumes of data with increasingly diverse patients, the parameters can become more complex to reflect the daily multifactorial decisions EMS teams make in the triage process. Further studies are needed to assess the efficacy and generalizability of XGBoost in emergency triage and to explore the utility of XGBoost in other clinical settings.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
