Sage Journals: Discover world-class research

Abstract

Purpose

We aimed to use machine learning (ML) algorithms with clinical, lab, and imaging data as input to predict various outcomes in traumatic brain injury (TBI) patients.

Methods

In this retrospective study, blood samples were analyzed for glial fibrillary acidic protein (GFAP) and ubiquitin C-terminal hydrolase L1 (UCH-L1). The non-contrast head CTs were reviewed by two neuroradiologists for TBI common data elements (CDE). Three outcomes were designed to predict: discharged or admitted for further management (prediction 1), deceased or not deceased (prediction 2), and admission only, prolonged stay, or neurosurgery performed (prediction 3). Five ML models were trained. SHapley Additive exPlanations (SHAP) analyses were used to assess the relative significance of variables.

Results

Four hundred forty patients were used to predict predictions 1 and 2, while 271 patients were used in prediction 3. Due to Prediction 3’s hospitalization requirement, deceased and discharged patients could not be utilized. The Random Forest model achieved an average accuracy of 1.00 for prediction 1 and an accuracy of 0.99 for prediction 2. The Random Forest model achieved a mean accuracy of 0.93 for prediction 3. Key features were extracranial injury, hemorrhage, UCH-L1 for prediction 1; The Glasgow Coma Scale, age, GFAP for prediction 2; and GFAP, subdural hemorrhage volume, and pneumocephalus for prediction 3, per SHAP analysis.

Conclusion

Combining clinical and laboratory parameters with non-contrast CT CDEs allowed our ML models to accurately predict the designed outcomes of TBI patients. GFAP and UCH-L1 were among the significant predictor variables, demonstrating the importance of these biomarkers.

Keywords

Traumatic brain injury machine learning predictive model glial fibrillary acidic protein ubiquitin C-terminal hydrolase computed tomography

Introduction

According to the Global Burden of Disease Study 2016 and a subsequent 2018 study, the global incidence of traumatic brain injury (TBI) is estimated at 27–69 million cases each year.^1,2 There were 223,135 TBI-related hospitalizations in 2019 and 64,362 TBI-related deaths in 2020 in the United States, indicating that this is a significant public health issue with devastating outcomes.³ Considering the broad spectrum of its clinical manifestations and injury heterogeneity, as well as the high incidence and prevalence of TBI worldwide, prognostication has become increasingly important.

Research has been conducted to generate predictor variables, methods, and models for enhancing the precision of prediction of outcomes following a TBI, which aids in treatment decisions and the management of expectations.^4–10 The Glasgow Coma Scale (GCS) has been used to promptly classify the severity of TBI for decades and it correlates with patient mortality and morbidity.¹¹ However, GCS is subject to interobserver variation and poorly correlates with mortality and morbidity at the favorable end of the spectrum.¹¹ In addition, imaging is crucial for identifying and prognosticating TBI patients. Imaging is necessary for TBI patients to identify injuries that may require immediate procedural intervention, that may benefit from early medical therapy or neurologic supervision, and to determine the prognosis of patients.¹² Especially non-contrast head computed tomography (CT), which quantifies neuro-parenchymal and bony injury, is vital for diagnosing, prognosis, and triaging TBI in the acute phase.^11,12 Multiple scoring or classifying systems, including the Marshall, Rotterdam, Helsinki, and Neuroimaging Radiological Interpretation System (NIRIS) scores, are based solely on non-contrast CT.^13–16 Moreover, a common data element (CDE) database for CT imaging was developed to facilitate the eventual systematic characterization of the natural history and prognostic factors in TBI.¹⁷ In addition to these clinical and imaging tools, the diagnostic and prognostic capabilities of blood-based biomarkers, such as S100B, glial fibrillary acidic protein (GFAP), ubiquitin C-terminal hydrolase L1 (UCH-L1), Interleukin 10, and Amyloid β1-40, have been investigated.^18–20

As the number of identified prognostic factors increases, physicians are required to manage more complex clinical, laboratory, and imaging data, necessitating the employment of more sophisticated analytical techniques. Deep learning and machine learning (ML)-based prediction models can employ this vast quantity of data to develop accurate prognosis models. While utilizing deep learning and ML models allows us to process vast amounts of data efficiently, interpretability issues make clinicians hesitant.²¹ Especially deep learning models are referred to as “black boxes” frequently.²² An interpretable ML-based predictive system incorporating clinical variables, blood biomarkers, and imaging biomarkers may improve the prognostic prediction, triage management and treatment strategy in TBI patients. Therefore, in the present study, we aimed to use ML algorithms using the clinical, lab, and imaging data as input to predict various outcomes in TBI patients while utilizing the SHapley Additive exPlanations (SHAP) approach to establish the interpretability of the models.

Materials and methods

Ethical considerations

The Institutional Review Board at Stanford University approved the study. The study complied with the Health Insurance Portability and Accountability Act. Patients or their legal authorized representatives were consented in the subacute phase of the trauma, after the initial workup was completed, and consent was obtained for being able to use their blood collected as part of the standard of care and for collecting their outcome.

Patient selection

In this retrospective cohort study, all consecutive patients transported to the Stanford Healthcare Emergency Department by ambulance or helicopter, for whom a trauma alert was initiated according to the established criteria²³ and who underwent non-contrast head CT scan due to TBI suspicion between November 2015 and April 2017 were evaluated for eligibility. The guide provided in the reference explains how and when the trauma alert was activated. The inclusion criteria were as follows: 1) Patients over the age of 18 transported by ambulance or helicopter with a trauma alert activated; 2) patients underwent a non-contrast head CT for suspected TBI; and 3) patients had blood biomarker results. Both penetrating and blunt trauma patients were included.

Data extraction

From electronic medical records, demographic and clinical information was extracted. Specifically, age, gender, the time elapsed between trauma and admission, GCS, mechanism of TBI, other major intracranial injury (OMII), for example, a stroke that would have caused the patient to get involved in a trauma, and other major extracranial injury (OMEI) at the time of trauma were collected. OMEI included fractures, cardiac diseases, pain and weakness, organ lacerations, operations, pneumothorax, and infections. The following follow-up data were also collected: disposition from the emergency department, disposition at discharge, duration of intensive care unit stay, and mortality data.

Our Institutional Review Board (IRB) approved the use of blood collected for clinical care but not utilized for standard-of-care clinical analysis within 48 h. We collected these blood samples just before discarding them per laboratory protocol. Using a sandwich enzyme-linked immunosorbent assay, samples were analyzed for GFAP and UCH-L1. All blood samples were collected, processed, and analyzed using the same procedures. The lower limit of quantification for GFAP was 3 pg/mL, and the limit of detection was not determined. The lower limit of quantification of UCH-L1 was 14 pg/mL, and the limit of detection was 6 pg/mL. Specimens with signal levels exceeding the quantification range were diluted and retested.

Three categories of outcomes are designed to predict: discharged or admitted for further management (prediction 1); in-hospital mortality (deceased or not deceased [prediction 2]); and course of hospital stay (admission only, prolonged stay, or neurosurgery performed [prediction 3]). Prolonged stay was defined as advanced care unit stay.

Imaging

The head CT was done within 30 min of admission. The non-contrast head CTs were reviewed by two experienced neuroradiologists, M.W. and B.J., with 25 and 12 years of experience in neuroradiology, respectively, for TBI CDEs developed by the National Institutes of Health.¹⁷ The presence or absence of the following CDEs was documented: skull fracture; pneumocephalus; hemorrhage; parenchymal injuries; mass effect; herniation; or shift. The volumes of epidural hemorrhage, subdural hemorrhage, cerebral hematoma, and contusion were manually measured. We manually estimated the volume of these lesions using consecutive CT imaging slices. The volume attributed to each slice was calculated by multiplying the measured area by the slice thickness. We determined the total volume of the hemorrhage by summing the volume contributions of all the slices where the lesions were visible. Additionally, midline shift distance was recorded. A 5-mm cutoff was established for the diagnosis of midline shift.

Predictor variables

Predictor variables included age, gender, GCS, mechanism of TBI, the time elapsed between trauma and admission, OMII, OMEI, GFAP, UCH-L1, skull fracture, pneumocephalus, hemorrhage, parenchymal injuries, mass effect, herniation, or shift, and the volumes of epidural hemorrhage, subdural hemorrhage, cerebral hematoma, and contusion. CT CDEs and novel blood biomarkers were added to the traditional predictor variables in TBI, such as GCS, to evaluate their predictive value in ML models.

Machine learning models

All analyses were generated using Python (version 3.7). Before training the model, a correlation matrix of TBI features was conducted to determine the potential correlations between the different features. The cohort was randomly divided into two sets for the predictions of discharged or admitted (prediction 1) and deceased or not deceased (prediction 2): training set (75%, 330 patients) and testing set (25%, 110 patients). The cohort was randomly divided into a training set (80%, 215 cases) and a testing set (20%, 56 cases) for the prediction of admission only, prolonged stay, or neurosurgery performed (prediction 3). Several ML models, including XGBoost, Random Forest, decision tree, support vector machines, and logistic regression, were trained using the training sets to compare their performances. During training, cross-validation was employed to prevent model overfitting and enhance its robustness. While four folds were used for training, one-fold of the training set data was used for validation. The ML model with the best average performance across all cross-validation sets was utilized for further testing. Using a grid search strategy, the hyperparameter was tuned. We evaluated the relative significance of predictor factors using SHAP. SHAP values represent the significance and direction of associations between features and outcomes. The individual contribution of each feature to the model predictions can be visualized using a matrix of SHAP values. Thus, the function of each model feature was represented in a more comprehensible fashion.

Statistical analysis

Fisher’s exact test and Chi-square test were utilized to determine the differences between categorical data, such as gender. The Kruskal–Wallis and Wilcoxon rank sum tests were used to compare continuous data differences. Using Cohen’s Kappa coefficient with or without quadratic weighting, correlations between predictions and the ground truth were compared. All statistical analyses were conducted utilizing RStudio (version 4.1.0). The level of statistical significance was set at 0.05 for all analyses.

Results

The initial screening included 662 patients admitted to the emergency department with trauma alert and available non-contrast head CT scans. Eight patients under the age of 18, four with restricted records, and four with underlying brain pathologies unrelated to TBI, such as brain tumors, were excluded. One hundred eighty-two patients were excluded for lack of blood samples. Additionally, 24 patients were excluded due to the absence of GFAP and UCH-L1 test results. A total of 440 patients were finally included in the study. It is worth noting that our patient cohort was utilized in another study.²⁴ Four hundred forty patients were used to predict predictions 1 (discharged or admitted for further management) and 2 (deceased or not deceased). Two hundred seventy-one patients were utilized in the prediction of prediction 3 (admission only, prolonged stay, or neurosurgery performed). Figure 1 depicts the selection of patients. Figure 2 demonstrates the correlation matrix of TBI features. There were relatively high positive correlations between GFAP and the volume of contusions.

Figure 1.

Patientselection.

Figure 2.

Thecorrelationmatrixoftheincludedfeatures.

Prediction 1 and 2

Table 1 displays the characteristics of the patient population, both by outcome groups and in total. In our cohort for prediction 1, 157 patients were discharged, and 283 were admitted for further management. There were 12 deceased and 428 non-deceased patients in the same cohort for prediction 2. There were statistically significant differences in age, GCS, TBI mechanism, GFAP, UCH-L1, subdural hemorrhage volume, and presence of OMII, OMEI, skull fracture, mass effect, herniation, or shift between discharged and admitted patients. Furthermore, there were statistically significant differences in age, gender, GCS, GFAP, UCH-L1, subdural hemorrhage volume, and presence of pneumocephalus, and mass effect, herniation, or shift between deceased and non-deceased patients.

Table 1.

Characteristics of the patient population used in predictions 1 and 2.

Features	All patients (n = 440)	Prediction 1		p value	Prediction 2		p value
Features	All patients (n = 440)	Discharged (n = 157)	Admitted (n = 283)	p value	Not deceased (n = 428)	Deceased (n = 12)
Age (y)	48.5 ± 22.9	39.0 ± 19.9	55.0 ± 23.5	<0.001^a	48.0 ± 22.7	80.5 ± 13.9	<0.001^a
Gender
Male	274	100	174	0.68^b	270	4	0.06^b
Female	166	57	109	0.68^b	158	8
GCS	15(14, 15)	15(15, 15)	15(14, 15)	<0.001^c	15(14, 15)	10(5, 12)	<0.001^a
TBI mechanism
Fall	170	40	130	0.002^c	161	9	0.07^c
Traffic	237	98	139		234	3
Violence	25	15	10		25	0
Others	8	4	4		8	0
Time (m)	60 ± 1777	60 ± 1421	60 ± 1949	0.50^a	60 ± 1797	60 ± 810	0.61^a
OMII	35	3	32	<0.001^b	35	0	0.62^b
OMEI	195	15	180	<0.001^b	187	8	0.14^b
GFAP (pg/mL)	38.8 ± 1723.3	21.0 ± 69.4	55.0 ± 2139.5	<0.001^a	37.0 ± 1696.2	134.4 ± 2462.9	<0.001^a
UCH–L1 (pg/mL)	796.1 ± 1574.4	594.8 ± 599.8	957.8 ± 1855.2	<0.001^a	782.0 ± 1574.5	1881.9 ± 1409.3	0.01^a
Skull fracture	35	0	35	<0.001^b	34	1	1.0^b
Pneumocephalus	66	0	66	<0.001^b	59	7	<0.001^b
Hemorrhage	69	0	69	<0.001^b	64	5	0.03^b
Parenchymal injuries	51	0	51	<0.001^b	47	4	0.04^b
Mass effect, herniation, or shift	25	0	25	<0.001^b	19	6	<0.001^b
EDH volume (cc)	0.0 ± 0.8	0.0 ± 0.0	0.0 ± 1.0	0.67^a	0.0 ± 0.8	0.0 ± 0.0	0.69^a
SDH volume (cc)	0.0 ± 20.2	0.0 ± 0.0	0.0 ± 25.0	<0.001^a	0.0 ± 19.0	0.3 ± 45.3	0.01^a
Hematoma volume (cc)	0.0 ± 11.2	0.0 ± 0.0	0.0 ± 14.0	0.30^a	0.0 ± 1.2	0.0 ± 60.3	0.06^a
Contusion volume (cc)	0.0 ± 3.3	0.0 ± 0.0	0.0 ± 4.0	0.14^a	0.0 ± 3.3	0.0 ± 0.0	0.74^a

n: number; y: years; GCS: Glasgow Coma Scale; TBI: traumatic brain injury; m: minutes; OMII: other major intracranial injury; OMEI: other major extracranial injury; GFAP: glial fibrillary acidic protein; UCH–L1: ubiquitin C-terminal hydrolase L1; EDH: epidural hemorrhage; cc: cubic centimeter; SDH: subdural hemorrhage.

^ap-value calculated using Wilcoxon rank sum test.

^bp-value calculated using Fisher's exact test.

^cp-value calculated using Chi-Square test.

During the training phase with cross-validation, Random Forest models achieved an average accuracy of 1.00 with a Kappa value of 0.99 for prediction 1 and an accuracy of 0.99 with a Kappa value of 0.82 for prediction 2. The average accuracy of XGBoost was 0.98 and 0.97 for predictions 1 and 2, respectively, making it the second-best performer. The average accuracies of the decision tree, support vector machines, and logistic regression models were below 0.90. Based on the initial results, only Random Forest models were used in the testing stage and analyzed with SHAP.

During the testing stage, the model for prediction 1 (discharged or admitted for further management) produced the most accurate results. The test set accuracy was 0.95, and the Kappa value was 0.88. In the testing stage, the accuracy of the model for prediction 2 (deceased or not deceased) was 0.98, with a Kappa value of 0.49.

According to the results of the SHAP analyses, the five most important features for prediction 1 were, in descending order: OMEI, hemorrhage, UCH-L1, OMII, and age. The top five features for prediction 2 were GCS, age, GFAP, UCH-L1, and mass effect, herniation, or shift, in descending order. Figure 3 depicts the bar plots and beeswarm plots for the results of SHAP analyses for predictions 1 and 2.

Figure 3.

(a) Bar and (b) beeswarm plots of the features for the result of SHAP analysis in prediction 1, and (c) bar and (d) beeswarm plots of the features for the result of SHAP analysis in prediction 2.

Prediction 3

Table 2 displays the characteristics of the patient population, both by outcome groups and in total. Since prediction 3 involves a hospital course prediction, deceased and discharged patients could not be utilized. In our prediction 3 cohort, 141 patients were admitted only, 120 had a prolonged stay, and 10 underwent neurosurgery. Between groups, there were statistically significant differences in age, GCS, GFAP, UCH-L1, epidural hemorrhage volume, subdural hemorrhage volume, contusion volume, and presence of OMII, OMEI, skull fracture, pneumocephalus, hemorrhage, parenchymal injuries, and mass effect, herniation, or shift.

Table 2.

Characteristics of the patient population used in prediction 3.

Features	All patients (n = 271)	Prediction 3			p-value
Features	All patients (n = 271)	Admission only (n = 141)	Prolonged stay (n = 120)	Neurosurgery performed (n = 10)	p-value
Age (y)	54.0 ± 23.4	47.0 ± 22.6	60.5 ± 23.1	49.5 ± 28.4	0.01^a
Gender
Male	170	84	81	5	0.29^b
Female	101	57	39	5	0.29^b
GCS	15(14, 15)	15(15, 15)	15(14, 15)	10(5, 13)	<0.001^a
TBI mechanism
Fall	121	55	60	6	0.22^b
Traffic	136	77	54	4
Violence	10	7	3	0
Others	4	2	2	0
Time (m)	60 ± 1985	60 ± 501	60 ± 532	83 ± 9654	0.35^a
OMII	32	24	8	0	0.04^b
OMEI	172	102	67	3	0.002^b
GFAP (pg/mL)	53.0 ± 2123.8	29.2 ± 78.7	116.6 ± 960.0	55.8 ± 10,187.7	<0.001^a
UCH–L1 (pg/mL)	950.7 ± 1871.2	789.2 ± 1134.2	1219.7 ± 1889.2	1193.9 ± 5384.2	0.002^a
Skull fracture	34	6	24	4	<0.001^b
Pneumocephalus	54	3	48	8	<0.001^b
Hemorrhage	64	13	48	3	<0.001^b
Parenchymal injuries	47	6	35	6	<0.001^b
Mass effect, herniation, or shift	19	1	9	9	<0.001^b
EDH volume (cc)	0.0 ± 1.0	0.0 ± 0.0	0.0 ± 1.4	0.0 ± 2.4	0.02^a
SDH volume (cc)	0.0 ± 23.7	0.0 ± 0.2	0.0 ± 12.6	51.8 ± 89.5	<0.001^a
Hematoma volume (cc)	0.0 ± 1.5	0.0 ± 0.0	0.0 ± 2.3	0.0 ± 0.0	0.15^a
Contusion volume (cc)	0.0 ± 4.1	0.0 ± 0.1	0.0 ± 4.3	0.0 ± 14.7	<0.001^a

n: number; y: years; GCS: Glasgow Coma Scale; TBI: traumatic brain injury; m, minutes; OMII: other major intracranial injury; OMEI: other major extracranial injury; GFAP: glial fibrillary acidic protein; UCH–L1: ubiquitin C-terminal hydrolase L1; EDH: epidural hemorrhage; cc: cubic centimeter; SDH: subdural hemorrhage.

^ap-values calculated using Kruskal–Wallis test.

^bp-value calculated using Chi-Square test.

In the training phase with cross-validation, the Random Forest model achieved a mean accuracy of 0.93 and a Kappa value of 0.88 for prediction 3. XGBoost's average accuracy for prediction 3 was 0.92, making it the second-best model. The average accuracies of the decision tree, support vector machines, and logistic regression models were less than 0.90. Based on these results, only the Random Forest model was utilized and analyzed with SHAP during the testing phase. In the testing phase, the accuracy of the model for prediction 3 was 0.82, with a Kappa value of 0.72.

Since prediction 3 does not contain a binary outcome with three distinct outcomes, the significance of the features varies for each outcome. Overall, GFAP, subdural hemorrhage volume, pneumocephalus, UCH-L1, and hemorrhage were, in descending order, the five most significant features for prediction 3 based on the results of the SHAP analyses. When predicting admission only and prolonged stay, the top five most significant features did not change, except for the order of UCH-L1 and pneumocephalus in predicting prolonged stay. The top five predictors of “neurosurgery performed” were pneumocephalus, mass effect, herniation, or shift, GCS, GFAP, and subdural hemorrhage volume, in decreasing order, which differed from the overall results. Figure 4 depicts the bar plots and beeswarm plots for the results of SHAP analyses for prediction 3.

Figure 4.

(a) Bar plot of the features for the result of SHAP analysis in prediction 3. (b) Swarmplots for the results of SHAP analyses in predictions of the admission only, (c) prolonged stay, and (d) neurosurgery.

Table 3 provides a summary of the results of the Random Forest models in detail. The Supplementary Figure 1 displays the confusion matrices for each prediction. The examples in Supplementary Figures 2–4 from our study showcase the application of our machine learning model.

Table 3.

Performances of Random Forest models.

Predictions		Kappa value	Standard Error	95% CI		ACC	SEN	SPE	PPV	NPV
Predictions		Kappa value	Standard Error	Lower	Upper	ACC	SEN	SPE	PPV	NPV
Prediction 1	Training (n = 330)	0.99	0.01	0.98	1.00	1.00	1.00	0.99	1.00	1.00
Prediction 1	Testing (n = 110)	0.88	0.05	0.78	0.97	0.95	0.96	0.92	0.96	0.92
Prediction 2	Training (n = 330)	0.82	0.09	0.70	1.00	0.99	0.78	1.00	1.00	0.93
Prediction 2	Testing (n = 110)	0.49	0.36	0.00	1.00	0.98	0.33	1.00	1.00	0.98
Prediction 3	Training (n = 215)	0.88 ^1a	0.07	0.74	1.00	0.93	—	—	—	—
Prediction 3	Testing (n = 56)	0.72 ^1a	0.15	0.43	1.00	0.82	—	—	—	—

CI: confidence interval; ACC: accuracy; SEN: sensitivity; SPE: specificity; PPV: positive predictive value; NPV: negative predictive value; n: number.

^aWeighted Kappa Value.

Discussion

This study presents a series of ML models that accurately predict the groups stratified based on the first decision at the emergency department (discharged or admitted for further management), mortality, and hospital course (admission only, prolonged stay, or neurosurgery performed) in TBI patients. We have chosen these outcomes to predict in order to assess the value of CT CDEs and novel blood biomarkers in real-world clinical scenarios, with the goal of improving the prediction of TBI patient prognosis. The most successful model was Random Forest for prediction 1 (discharged or admitted for further management), prediction 2 (deceased or not deceased), and prediction 3 (admission only, prolonged stay, or neurosurgery performed) with accuracies of 0.95, 0.98, and 0.82 in the test sets, respectively.

Predicting TBI outcomes with prognostic models, deep learning, or ML is not a novel concept. In the context of the International Mission for Prognosis and Clinical Trials in TBI (IMPACT) model, our study investigates the potential of ML algorithms by incorporating clinical, laboratory, and imaging data to predict outcomes. While the IMPACT model is well-established and focuses on clinical, imaging, and demographic factors, our study aims to supplement it by incorporating blood biomarkers and other additional predictors. Furthermore, there are studies describing models to predict the functional outcome or The Glasgow Outcome Scale-Extended,^5,25–29 and more recent studies using images as inputs.^8,30,31 Furthermore, there are studies in the literature similar to ours that describe models for predicting in-hospital mortality,^29,32–35 early mortality,^36–38 discharge position,^39,40 need for hospital admission,⁶ emergency neurosurgery,⁴¹ and length of hospital stay.⁴ In addition to contributing to the body of knowledge by describing the efficacy of incorporating ML into patient care to predict multiple outcomes simultaneously in TBI patients, this study is unique since it has used blood biomarkers such as GFAP and UCH-L1, and non-contrast CT CDEs as input variables. Although other studies have utilized CT data as input variables, the CDEs employed in this study make it different. CDEs were established to promote the use of similar nomenclature and criteria in defining intracranial injuries across all imaging examinations; thus, we believe their use is crucial.

In our study, the model for prediction 1 (discharged or admitted for further management) produced a test set accuracy of 0.95, a sensitivity of 0.96, a specificity of 0.92, a positive predictive value (PPV) of 0.96, and a negative predictive value (NPV) of 0.92. Similarly, Marincowitz et al. predicted the need for hospital admission, which was defined as a deterioration measure intended to encompass the need for the hospital admission.⁶ In this well-written study, their model had an accuracy of 0.32, a sensitivity of 0.99, a specificity of 0.07, a PPV of 0.29, and an NPV of 0.94. In their study, the five most predictive factors were injury severity on CT (Modified Marshall Criteria), GCS, number of injuries, the hospital admitted to, and subdural hemorrhage. Similarly, our study revealed that OMEI (similar to number of injuries), OMII (similar to number of injuries), and hemorrhage were among the top five most predictive factors for admission or discharge. Consequently, even though the designed outcomes were not identical, our results align with theirs, supporting our findings. In addition, UCH-L1 was among the top five most predictive factors in our study, indicating the importance of the biomarker.

Our model for prediction 2 (in-hospital mortality) yielded a test set accuracy of 0.98, a sensitivity of 0.33, a specificity of 1.00, a PPV of 1.00, and an NPV of 0.98. In the testing set, only three of 110 patients were deceased, which may be the primary reason for the low sensitivity. The best-performing model of Matsuo et al. in the test set showed a sensitivity of 0.88, a specificity of 0.88, and an accuracy of 0.89 in predicting in-hospital mortality.²⁹ Furthermore, the best-performing model of Abujaber et al. in the test set yielded an accuracy of 0.96, a sensitivity of 0.73, a specificity of 0.99, a PPV of 0.88, and an NPV of 0.97.³² Moreover, the best-performing model of Hsu et al. in the test set yielded an accuracy of 0.93, a PPV (precision) of 0.93, and a sensitivity (recall) of 0.93.³³ Finally, the best-performing model of Rau et al. was artificial neural network-based and yielded an accuracy of 0.92, a sensitivity of 0.84, and a specificity of 0.93 in the test set.³⁴ While it is not ideal to directly compare models based on these results, our model produced comparable outcomes, except for sensitivity, due to the small number of deceased patients in our test set. As in our study, GCS was the most or second most significant factor in three of these studies.^29,33,34 UCH-L1 and GFAP were among our study’s top five most predictive factors, again indicating these biomarkers' significance.

Our model for prediction 3 (course of hospital stay [admission only, prolonged stay, or neurosurgery performed]) yielded a test set accuracy of 0.82. Similarly, Moyer et al. predicted the need for emergency neurosurgery within the 24 h following the admission.⁴¹ However, our model's outcome was not binary, with three possible options. Pneumocephalus, mass effect, herniation or shift, GCS, and parenchymal injuries were among the top five variables in predicting the need for neurosurgery, highlighting the significance of imaging. GFAP was also among the top five most predictive factors for the need for neurosurgery.

Bazarian et al. demonstrated the high sensitivity and NPV of the UCH-L1 and GFAP tests for predicting the absence of intracranial lesions on head CT scans.¹⁸ While this supports its potential significance in excluding the requirement for a CT scan in TBI patients in emergency departments, our research also indicates that their level can be utilized to predict TBI patients' hospital course. Furthermore, other studies have utilized these biomarkers to predict the prognosis of TBI patients. Korley et al. demonstrated that day-of-injury plasma concentrations of GFAP and UCH-L1 have good to excellent predictive value for death and unfavorable outcomes, particularly in patients with a GCS score of 3 to 12.⁴² Moreover, Helmrich et al. showed that serum biomarkers, particularly UCH-L1, provide incremental prognostic value for functional outcome prediction after TBI when combined with established prognostic models.⁴³ There are other studies highlighting the diagnostic and prognostic potential of protein biomarkers, specifically GFAP and UCH-L1, in TBI patients.^44,45 It has been shown that GFAP and UCH-L1 can aid in detecting intracranial lesions, predicting unfavorable outcomes, and guiding therapeutic interventions, according to meta-analyses and longitudinal studies.^44,46 Moreover, evaluating multiple biomarkers with distinct cellular origins can improve outcome prediction models, highlighting the importance of incorporating these biomarkers into evaluations of TBI patients.⁴⁵ Additionally, numerous studies have proposed cutoff values by illustrating a correlation between elevated GFAP and UCH-L1 levels with TBI diagnosis or prognosis. Papa et al. demonstrated that, utilizing a UCH-L1 cutoff level of 0.09 ng/mL for detecting intracranial lesions on CT, the classification performance yielded a sensitivity of 100% and a specificity of 21%.⁴⁷ Moreover, the classification performance for predicting the necessity of neurosurgical intervention, using a UCH-L1 cutoff level of 0.21 ng/mL, achieved a sensitivity of 100% and a specificity of 57%.⁴⁷ Furthermore, Mondello et al. demonstrated that an analysis of the glial-neuronal ratio, defined as the ratio of GFAP concentration (ng/mL) to UCH-L1 concentration (ng/mL), for predicting focal mass lesions with a cutoff value of >1.43, resulted in a specificity of 83% and a sensitivity of 60%.⁴⁸

Our study has several limitations. First, it shares the inherent constraints of retrospective studies. Additionally, since our findings were sourced from a single institution, it is crucial to validate these results externally within a broader and more diverse patient population. Such validation ensures the findings’ consistency and generalizability across varied settings and populations. We also did not account for comorbidities that could impact patient recovery. Furthermore, an imbalance in training data distribution might cause learning algorithms to underperform on the minority class.⁴⁹ Meanwhile, imbalances in the test data can lead to misleading conclusions with certain metrics.⁴⁹ Therefore, it is especially noteworthy to approach the findings with caution due to the class imbalances observed in Predictions 2 and 3.

Conclusion

ML might be helpful in accurately predicting the hospital course of TBI patients by combining clinical and laboratory parameters with non-contrast head CT CDEs. Blood biomarkers like GFAP and UCH-L1 were among the significant variables for prediction, demonstrating the originality of our study. ML models have the potential to enhance prognostic classification.

Supplemental Material

Supplemental Material - Enhancing hospital course and outcome prediction in patients with traumatic brain injury: A machine learning study

Supplementary Material for Enhancing hospital course and outcome prediction in patients with traumatic brain injury: A machine learning study by Guangming Zhu, Burak B Ozkara, Hui Chen, Bo Zhou, Bin Jiang, Victoria Y Ding and Max Wintermark in The Neuroradiology Journal.

Footnotes

Author contributions

Conceptualization, G.Z. and M.W. Methodology, G.Z. and M.W. Software, G.Z. and B.O. Validation, B.O., H.C., B.Z., B.J., V.D., and M.W. Formal analysis, G.Z. Investigation, G.Z. and M.W. Resources, M.W. Data Curation, H.C., B.Z., B.J., and V.D. Writing - Original Draft, G.Z. and B.O. Writing - Review & Editing, M.W. Visualization, B.B.O. and G.Z. Supervision, M.W. Project administration, M.W.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Burak B Ozkara

Hui Chen

Max Wintermark

Supplemental Material

Supplemental material for this article is available online.

References

GBD 2016 Traumatic Brain Injury and Spinal Cord Injury Collaborators . Global, regional, and national burden of traumatic brain injury and spinal cord injury, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol 2019; 18: 56–87.

Dewan

Rattani

Gupta

, et al. Estimating the global incidence of traumatic brain injury. J Neurosurg 2018; 1–18.

Multiple Cause of Death Data on CDC WONDER, https://wonder.cdc.gov/mcd.html (accessed 10 January 2023).

Fang

Pan

Zhao

, et al. A machine learning-based approach to predict prognosis and length of hospital stay in adults and children with traumatic brain injury: retrospective cohort study. J Med Internet Res 2022; 24: e41819.

Adil

Elahi

Patel

, et al. Deep learning to predict traumatic brain injury outcomes in the low-resource setting. World Neurosurg 2022; 164: e8–e16.

Marincowitz

Paton

Lecky

, et al. Predicting need for hospital admission in patients with traumatic brain injury or skull fractures identified on CT imaging: a machine learning approach. Emerg Med J 2022; 39: 394–401.

Nourelahi

Dadboud

Khalili

, et al. A machine learning model for predicting favorable outcome in severe traumatic brain injury patients after 6 months. Acute Crit Care 2022; 37: 45–52.

Pease

Arefan

Barber

, et al. Outcome prediction in patients with severe traumatic brain injury using deep learning from head CT scans. Radiology 2022; 304: 385–394.

Rostami

Gustafsson

Hånell

, et al. Prognosis in moderate-severe traumatic brain injury in a Swedish cohort and external validation of the IMPACT models. Acta Neurochir 2022; 164: 615–624.

10.

Wang

Yang

Zhu

, et al. An update on diagnostic and prognostic biomarkers for traumatic brain injury. Expert Rev Mol Diagn 2018; 18: 165–180.

11.

Schweitzer

Niogi

Whitlow

, et al. Traumatic brain injury: imaging patterns and complications. Radiographics 2019; 39: 1571–1595.

12.

Wintermark

Sanelli

Anzai

, et al. Imaging evidence and recommendations for traumatic brain injury: conventional neuroimaging techniques. J Am Coll Radiol 2015; 12: e1-14.

13.

Wintermark

Ding

, et al. Neuroimaging radiological interpretation system for acute traumatic brain injury. J Neurotrauma 2018; 35: 2665–2672.

14.

Raj

Siironen

Skrifvars

, et al. Predicting outcome in traumatic brain injury: development of a novel computerized tomography classification system (Helsinki computerized tomography score). Neurosurgery 2014; 75: 632–646.

15.

Marshall

Klauber

, et al. A new classification of head injury based on computerized tomography. Journal of Neurosurgery 1991; 75: S14–S20.

16.

Maas

AIR

Hukkelhoven

CWPM

Marshall

, et al. Prediction of outcome in traumatic brain injury with computed tomographic characteristics: a comparison between the computed tomographic classification and combinations of computed tomographic predictors. Neurosurgery 2005; 57: 1173–1182.

17.

Haacke

Duhaime

Gean

, et al. Common data elements in radiologic imaging of traumatic brain injury. J Magn Reson Imaging 2010; 32: 516–543.

18.

Bazarian

Biberthaler

Welch

, et al. Serum GFAP and UCH-L1 for prediction of absence of intracranial injuries on head CT (ALERT-TBI): a multicentre observational study. Lancet Neurol 2018; 17: 782–789.

19.

Yin

Weng

Lai

, et al. [GCS score combined with CT score and serum S100B protein level Can evaluate severity and early prognosis of acute traumatic brain injury]. Nan Fang Yi Ke Da Xue Xue Bao 2021; 41: 543–548.

20.

Posti

Takala

RSK

Raj

, et al. Admission levels of Interleukin 10 and Amyloid β 1-40 improve the outcome prediction performance of the Helsinki computed tomography score in traumatic brain injury. Front Neurol 2020; 11: 549527.

21.

Karabacak

Ozkara

Mordag

, et al. Deep learning for prediction of isocitrate dehydrogenase mutation in gliomas: a critical approach, systematic review and meta-analysis of the diagnostic test performance using a Bayesian approach. Quant Imaging Med Surg 2022; 12: 4033–4046.

22.

Rudin

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 2019; 1: 206–215.

23.

Stanford Healthcare Trauma Guidelines. (accessed 28 August 2023).

24.

Zhou

Ding

, et al. Validation of the neuroimaging radiological interpretation system for acute traumatic brain injury. J Comput Assist Tomogr 2019; 43: 690–696.

25.

Güiza

Depreitere

Piper

, et al. Novel methods to predict increased intracranial pressure during intensive care and long-term neurologic outcome after traumatic brain injury: development and validation in a multicenter dataset. Crit Care Med 2013; 41: 554–564.

26.

Farzaneh

Williamson

Gryak

, et al. A hierarchical expert-guided machine learning framework for clinical decision support systems: an application to traumatic brain injury prognostication. NPJ Digit Med 2021; 4: 78.

27.

Hernandes Rocha

Elahi

Cristina da Silva

, et al. A traumatic brain injury prognostic model to support in-hospital triage in a low-income country: a machine learning-based approach. J Neurosurg 2019; 132: 1961–1969.

28.

Say

Chen

Sun

, et al. Machine learning predicts improvement of functional outcomes in traumatic brain injury patients after inpatient rehabilitation. Front Rehabil Sci 2022; 3: 1005168.

29.

Matsuo

Aihara

Nakai

, et al. Machine learning to predict in-hospital morbidity and mortality after traumatic brain injury. J Neurotrauma 2020; 37: 202–210.

30.

Shih

Y-J

Liu

Y-L

Chen

J-H

, et al. Prediction of intraparenchymal hemorrhage progression and neurologic outcome in traumatic brain injury patients using radiomics score and clinical parameters. Diagnostics 2022; 12: 1677.

31.

Mohamed

Alamri

Mohamed

, et al. Prognosticating outcome using magnetic resonance imaging in patients with moderate to severe traumatic brain injury: a machine learning approach. Brain Inj 2022; 36: 353–358.

32.

Abujaber

Fadlalla

Gammoh

, et al. Prediction of in-hospital mortality in patients with post traumatic brain injury using National Trauma Registry and Machine Learning Approach. Scand J Trauma Resusc Emerg Med 2020; 28: 44.

33.

Hsu

S-D

Chao

Chen

S-J

, et al. Machine learning algorithms to predict in-hospital mortality in patients with traumatic brain injury. J Pers Med 2021; 11: 1144.

34.

Rau

C-S

Kuo

P-J

Chien

P-C

, et al. Mortality prediction in patients with isolated moderate and severe traumatic brain injury using machine learning models. PLoS One 2018; 13: e0207192.

35.

Warman

Seas

Satyadev

, et al. Machine learning for predicting in-hospital mortality after traumatic brain injury in both high-income and low- and middle-income countries. Neurosurgery 2022; 90: 605–612.

36.

Lee

Hwang

, et al. A machine learning-based prognostic model for the prediction of early death after traumatic brain injury: comparison with the Corticosteroid Randomization After Significant Head Injury (CRASH) Model. World Neurosurg 2022; 166: e125–e134.

37.

Amorim

Oliveira

Malbouisson

, et al. Prediction of early TBI mortality using a machine learning approach in a LMIC population. Front Neurol 2019; 10: 1366.

38.

Wang

Zhang

, et al. XGBoost machine learning algorism performed better than regression models in predicting mortality of moderate-to-severe traumatic brain injury. World Neurosurg 2022; 163: e617–e622.

39.

Satyadev

Warman

Seas

, et al. Machine learning for predicting discharge disposition after traumatic brain injury. Neurosurgery 2022; 90: 768–774.

40.

Karabacak

Margetis

. Prognosis at your fingertips: a machine learning-based web application for outcome prediction in acute traumatic epidural hematoma. J Neurotrauma. DOI: 10.1089/neu.2023.0122.

41.

Moyer

J-D

Lee

Bernard

, et al. Machine learning-based prediction of emergency neurosurgery within 24 h after moderate to severe traumatic brain injury. World J Emerg Surg 2022; 17: 42.

42.

Korley

Jain

Sun

, et al. Prognostic value of day-of-injury plasma GFAP and UCH-L1 concentrations for predicting functional recovery after traumatic brain injury in patients from the US TRACK-TBI cohort: an observational cohort study. Lancet Neurol 2022; 21: 803–813.

43.

Helmrich

IRAR

Czeiter

Amrein

, et al. Incremental prognostic value of acute serum biomarkers for functional outcome after traumatic brain injury (CENTER-TBI): an observational cohort study. Lancet Neurol 2022; 21: 792–802.

44.

Pei

Tang

Zhang

, et al. The diagnostic and prognostic value of glial fibrillary acidic protein in traumatic brain injury: a systematic review and meta-analysis. Eur J Trauma Emerg Surg. DOI: 10.1007/s00068-022-01979-y.

45.

Thelin

Al Nimer

Frostell

, et al. A serum protein biomarker panel improves outcome prediction in human traumatic brain injury. J Neurotrauma 2019; 36: 2850–2862.

46.

Peters

Schnell

Saugstad

, et al. Longitudinal course of traumatic brain injury biomarkers for the prediction of clinical outcomes: a review. J Neurotrauma 2021; 38: 2490–2501.

47.

Papa

Lewis

Silvestri

, et al. Serum levels of ubiquitin C-terminal hydrolase distinguish mild traumatic brain injury from trauma controls and are elevated in mild and moderate traumatic brain injury patients with intracranial lesions and neurosurgical intervention. J Trauma Acute Care Surg 2012; 72: 1335–1344.

48.

Mondello

Jeromin

Buki

, et al. Glial neuronal ratio: a novel index for differentiating injury type in patients with severe traumatic brain injury. J Neurotrauma 2012; 29: 1096–1104.

49.

Jeni

Cohn

De La Torre

. Facing imbalanced data–Recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. 2013, 245–251.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.97 MB

0.00 MB