Sage Journals: Discover world-class research

Abstract

Computed tomography (CT) is an important imaging modality for guiding prognostication in patients with traumatic brain injury (TBI). However, because of the specialized expertise necessary, timely and dependable TBI prognostication based on CT imaging remains challenging. This study aimed to enhance the efficiency and reliability of TBI prognostication by employing machine learning (ML) techniques on CT images. A retrospective analysis was conducted on the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) data set (n = 1016). An ML-driven binary classifier was developed to predict favorable or unfavorable outcomes at 6 months post-injury. The prognostic performance was assessed using the area under the curve (AUC) over fivefold cross-validation and compared with conventional models that depend on clinical variables and CT scoring systems. An external validation was performed using the Comparative Indian Neurotrauma Effectiveness Research in Traumatic Brain Injury (CINTER-TBI) data set (n = 348). The developed model achieved superior performance without the necessity for manual CT assessments (AUC = 0.846 [95% CI: 0.843–0.849]) compared with the model based on the clinical and laboratory variables (AUC = 0.817 [95% CI: 0.814–0.820]) and established CT scoring systems requiring manual interpretations (AUC = 0.829 [95% CI: 0.826–0.832] for Marshall and 0.838 [95% CI: 0.835–0.841] for International Mission for Prognosis and Analysis of Clinical Trials in TBI [IMPACT]). The external validation demonstrated the prognostic capacity of the developed model to be significantly better (AUC = 0.859 [95% CI: 0.857–0.862]) than the model using clinical variables (AUC = 0.809 [95% CI: 0.798–0.820]). This study established an ML-based model that provides efficient and reliable TBI prognosis based on CT scans, with potential implications for earlier intervention and improved patient outcomes.

Introduction

Traumatic brain injury (TBI), characterized by a disruption in brain function from external forces to the head, ranks as is a primary cause of death and disability in trauma patients globally.^1,2 Annually, a significant number of TBI patients undergo initial evaluation at emergency departments. Healthcare professionals must depend on precise prognostication to provide tailored treatment and guidance for patients and their families. Computed tomography (CT) plays a crucial role in guiding prognostication, as CT findings delineate a wide range of structural effects of TBI. Although magnetic resonance imaging (MRI) may be more sensitive in detecting small white-matter lesions in later TBI stages or for follow-ups, CT remains the imaging modality of choice for initial TBI assessment because of its wider availability, cost effectiveness, and faster imaging process.³ However, manually interpreting CT scans of head trauma can be complex and time intensive, necessitating specialized expertise.⁴ Additionally, manual assessment may result in inconsistent predictions of outcomes. According to a study led by Sarigul and coworkers, discordant prognostic perspectives among clinicians are common, with >70% reporting occasional to frequent differences in prognostic viewpoints with their colleagues.⁵

Several CT scoring systems – Marshall,⁶ Rotterdam,⁷ International Mission for Prognosis and Analysis of Clinical Trials in TBI (IMPACT),⁸ and Helsinki⁹ – have been proposed for a more efficient CT assessment. Yet, these systems can be demanding for some clinicians because of the requirement for manual measurements and meticulous classification of CT findings. This aligns with previous literature indicating that these systems are rarely used in medical practice.⁵ The time required for measuring CT findings and variability in outcomes among clinicians often lead to non-standardized prognostications.

Machine learning (ML) techniques applied to CT imaging have recently garnered significant interest for TBI prognostication.^10
–12 However, most existing approaches rely on CT imaging features obtainable only through manual assessments, necessitating extensive clinical experience.⁴ Consequently, there is a pressing need for an ML-based model capable of automatically predicting long-term outcomes using CT and clinical data to support clinicians of all expertise levels.

Recently, the Vision Transformer (ViT)¹³ has emerged as a prominent ML advancement, offering a competitive alternative to convolutional neural networks. ViT has been successfully applied to various ML tasks, including classification, segmentation, and reconstruction, for two-dimensional (2D) images and three-dimensional (3D) data, such as videos.¹⁴ This success has spurred efforts to adapt ViT for 3D medical imaging analysis, such as CT, by considering a video to be analogous to a patient's CT scan: each video frame corresponds to a CT slice capturing a section of the patient's body.¹⁵

Nonetheless, merely substituting a video for a patient's CT scan does not suffice for creating ML-based TBI prognostic models. As indicated in prior research,⁹ both CT imaging and clinical variables are needed as input data, rather than using either one exclusively. This finding suggests that a novel 3D ViT architecture accommodating multimodal inputs is needed, as opposed to conventional ViT architectures limited to single-modal inputs.

This study aims to develop a rapid and reliable prognostic model for TBI patients utilizing raw CT imaging data. We hypothesize that our proposed algorithm can predict long-term outcomes for TBI patients without manual CT assessments and yield superior prognoses compared with conventional models requiring manual CT interpretations.

The contributions of this study are as follows: (1) introduction of an ML-based model to predict 6-month outcomes using raw CT imaging and clinical data, without the need for manual CT assessments; (2) proposal of an effective method for multimodal inputs, substantially improving prognostic performance within the efficient 3D ViT architecture; (3) verification of our method's efficacy using the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI), an international prospective TBI database; and (4) external validation to confirm our method's generalizability and applicability in clinical practice.

Methods

This investigation adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines¹⁶ for transparent reporting of studies.

Study population and data set

The CENTER-TBI and Comparative Indian Neurotrauma Effectiveness Research in Traumatic Brain Injury (CINTER-TBI) data sets were employed for model development and external validation, respectively.

CENTER-TBI data set

To construct a prognostic model, we incorporated data from a CENTER-TBI (version Core 3.0) study.¹⁷ The data for the CENTER-TBI study have been collected through the Quesgen e-CRF (Quesgen Systems Inc., USA), hosted on the INCF platform and extracted via the INCF Neurobot tool (INCF, Sweden). The CENTER-TBI database consists of patients who were prospectively enrolled at >60 participating centers throughout Europe and Israel from December 19, 2014, to December 17, 2017. Inclusion criteria encompassed a clinical TBI diagnosis, CT scan indication, admission to the study center within 24 h of injury, and informed consent acquisition in accordance with local and national regulations. Exclusion criteria involved severe pre-existing neurological disorders that could confound outcome assessments. The CENTER-TBI study¹⁸ prospectively designated patients to one of the following care pathways: emergency room assessment and discharge, hospital ward admission, or primary intensive care unit (ICU) admission. At 6 months post-injury, patient outcomes were assessed using the Glasgow Outcome Scale-Extended (GOSE),¹⁹ a widely accepted instrument for evaluating disability and recovery following TBI. The clinical research team of CENTER-TBI conducted neuropsychological assessments and structured interviews/questionnaires to determine GOSE scores.

We utilized the following data from the CENTER-TBI database as predictors in our analyses: clinical variables (age, pupillary reactivity, and Glasgow Coma Scale [GCS] motor score); laboratory variables (glucose [mmol/L] and hemoglobin [g/dL]); and initial non-contrast brain CT scans obtained at a study hospital following injury.

CINTER -TBI data set

After developing a prognostic model utilizing the CENTER-TBI data set, we conducted an external validation using the CINTER-TBI data set, which encompasses data from 1046 patients with TBI admitted to two major trauma centers in New Delhi, India between July 2016 and December 2017. The CINTER-TBI study adhered to the same inclusion and exclusion criteria as the CENTER-TBI study. The key difference between these studies resides in the GOSE scoring methodology. Investigators in the CINTER-TBI study employed questionnaires, as opposed to the structured interviews used in CENTER-TBI, to inquire about patients' quality of life and subjective feelings. The final GOSE rating was determined based on the clinical judgment of the CINTER-TBI investigators. Moreover, whereas the CENTER-TBI study collected both clinical and laboratory variables, the CINTER-TBI data set only contained clinical variables. To address this discrepancy, we developed a model that did not depend on laboratory variables for external validation using the CINTER-TBI data set.

Outcome scale

All prognostic models examined in this study serve as binary classifiers for the dichotomized 6-month GOSE, distinguishing unfavorable outcomes (GOSE 1–4: death, vegetative state, severe disability) from favorable ones (GOSE 5–8: moderate disability, good recovery), a classification method that has been widely employed in previous research.^20,21 The binary classifier produces a probability of an unfavorable outcome at 6 months post-injury. Consequently, the model's prediction is anticipated to be high for patients with unfavorable outcomes and low for those with favorable outcomes. The ground truth 6-month GOSE for training the classifier were retrieved from the actual patient outcomes recorded in the CENTER-TBI data set. The potential expansion from binary to a multi-class classifier for predicting more detailed GOSE categories will be addressed later in the Discussion.

Baseline models

Several baseline models were introduced to compare their prognostic performance with our proposed model (Table 1).

Table 1.

Flow Chart Illustrating the Patient Selection Process

Approach	Input (predictor)			Output
Approach	Clinical variables (CV)	Laboratory variables (LV)	CT	Output
Baseline type 1 (CV or LV only)	✓ Age, pupil, motor score	-	-	Probability of unfavorable outcome
Baseline type 1 (CV or LV only)	✓	✓ Glucose, hemoglobin	-
Baseline type 2 (CT scoring system)	-	-	✓ Marshall score
	-	-	✓ Rotterdam score
	-	-	✓ IMPACT score
	-	-	✓ Helsinki score
Ours (Automated CT interpretation)	-	-	✓ Raw CT images
Baseline type 3 (CT scoring system + CV)	✓	-	✓ Marshall score
	✓	-	✓ Rotterdam score
	✓	-	✓ IMPACT score
	✓	-	✓ Helsinki score
Ours + CV	✓	-	✓ Raw CT images
Baseline type 4 (CT scoring systems + CV + LV)	✓	✓	✓ Marshall score
	✓	✓	✓ Rotterdam score
	✓	✓	✓ IMPACT score
	✓	✓	✓ Helsinki score
Ours + CV + LV	✓	✓	✓ Raw CT images

CT, computed tomography; IMPACT, International Mission for Prognosis and Analysis of Clinical Trials in Traumatic Brain Injury.

The first category of baseline models features a scalar predictor, also referred to as the IMPACT core model,⁸ which is computed by summing up three clinical variables (age, pupillary reactivity, and motor score from GCS). Another baseline model, known as the IMPACT laboratory model,⁸ incorporates both clinical and laboratory variables (glucose and hemoglobin) and employs a scalar predictor on a scale of 0–7 to estimate the 6-month outcome. In alignment with the IMPACT study paper,⁸ which employs a scalar for TBI outcome prediction, we similarly used scalar values for clinical and laboratory variables. Although it is technically feasible to treat these variables separately and form a vector, using scalar input was chosen for an equitable comparison of prognostic abilities between our algorithm and the IMPACT-based models.

In the second category, the baseline models are established on CT scoring systems that systematically quantify CT findings pertinent to TBI prognostication. The systems considered in this study include Marshall,⁶ Rotterdam,⁷ IMPACT,⁸ and Helsinki.⁹ They offer scalar indicators, relying on the presence or absence of typical CT findings among TBI patients (e.g., traumatic subarachnoid hemorrhage, subdural hematoma, midline shift). Notably, these systems do not generate prognoses automatically. Manual interpretation and measurement are mandatory to assess CT findings and calculate CT scores. The manually calculated, verified, and recorded CT scores in the CENTER-TBI database were used in our analyses.

The third category of baseline models relies on both clinical variables and CT scoring systems. The predictor is a 2D vector, with one component representing a score from one of the CT scoring systems (Marshall, Rotterdam, or Helsinki) and the other component derived from the IMPACT core model. For the IMPACT-based CT scoring system, we employed an extended model proposed in the literature⁸ that combined scores of clinical variables and CT findings to produce a scalar predictor.

The fourth category incorporates CT findings, clinical variables, and laboratory variables, using 2D vectorized predictors. The first element is the sum of the IMPACT core or IMPACT laboratory models, and the second element is a score from one of the CT scoring systems (Marshall, Rotterdam, or Helsinki). An IMPACT extended model was utilized, combining clinical variables, CT findings, and laboratory variables to produce a scalar predictor. The IMPACT extended model's prior adoption in the context of TBI prognostication is well noted in the literature.⁹

Each of the four baseline models inputs either a single scalar or a two-element vector into the 3D-ViT model's classification head (refer to Document S1 for more details), functioning comparably to a logistical regression classifier. The output from this classification head is expressed as a probability, ranging from zero to one, predicting the likelihood of an unfavorable outcome 6 months post-injury. It is important to note that the original IMPACT study did not explicitly propose a prognostic model based solely on laboratory variables or CT findings. These models were introduced in our study as baselines to illustrate the potential of single-modality-based predictions and their comparison with our proposed algorithm. The methodology for calculating the input values for these baseline models was adapted from the calculator presented in the IMPACT study.⁸

The clinical variables, scores from the CT scoring systems, and laboratory variables used for these baseline models were all extracted from the CENTER-TBI data set. Detailed information on the clinical and laboratory variables, as well as data curation conducted in the CENTER-TBI data set, can be found in previous CENTER-TBI studies.^17,18

Algorithm design

Current CT scoring systems necessitate manual assessments to identify and quantify CT findings for calculating prognostic scores. This makes timely and reliable TBI prognostication using these systems challenging. We sought to develop an ML algorithm that can interpret CT imaging data automatically, thus eliminating manual assessments. This method provides end-to-end prognostication, allowing direct prognosis computation from raw CT data without human intervention, and the ML algorithm was anticipated to identify imaging features relevant to TBI prognostication.

More importantly, our model does not merely input raw CT images into pre-existing ML models. Instead, we leveraged a 3D ViT-based model, originally for video recognition, for CT imaging. The original 3D ViT model accepted only single-modal input (primarily video data) and did not accommodate multimodal predictors (such as clinical variables, laboratory variables, and CT images). Several studies have explored multimodal inputs within the medical imaging field, a concept known as early fusion techniques.²² However, as subsequently demonstrated, the conventional early fusion approach underperformed in the context of TBI prognostication. To address this limitation, we proposed an augmented early fusion technique as a simple and effective solution for TBI prognostication, suitable for implementation within the 3D ViT-based architecture. The details of input data, pre-processing, 3D ViT architecture, and the augmented early fusion technique can be found in the supplementary material (Documents S1 and S2).

All ML models were implemented with the PyTorch framework on a workstation with two NVIDIA GeForce 3090 GPU cards, each with 24 GB of memory.

Algorithm evaluation

We developed a model that predicts the probability of unfavorable outcomes for each CT scan. Ideally, the model's predictions should be one for patients with unfavorable outcomes and zero for patients with favorable outcomes. We assessed the performance of our proposed algorithm and baseline models using a fivefold cross-validation on the CENTER-TBI data set, based on the following metrics: area under the curve (AUC), sensitivity, and specificity. Following internal validation, we performed external validation by calculating the evaluation metrics on the CINTER-TBI data set using models pre-trained on the CENTER-TBI data set.

To dichotomize predictions (unfavorable vs. favorable), we employed the Youden index²³ for determining cutoff points. The Youden index (or Youden's J statistic) is quantified as J = sensitivity + specificity – 1.²⁴ We chose the threshold at which the maximum Youden index was observed, which is a common practice in determining optimal cutoff points in medical literature.²⁵

Subgroup analysis

We further evaluated our algorithm's performance within specific subgroups by comparing it with one of the baseline models. Participants were divided into three severity subgroups based on the GCS: mild (GCS = 3–8), moderate (GCS = 9–12), and severe (GCS = 13–15). GCS scores were available for all patients in the CENTER- and CINTER-TBI data sets. We calculated the evaluation metrics for each severity subgroup to determine the proposed method's applicability across different severity levels.

Discordant case analysis

We investigated discordant cases in which baseline and developed model predictions differed, categorizing them as: (1) true predictions by the baseline model and false predictions by the developed model, or (2) false predictions by the baseline model and true predictions by the developed model. The ground truth values for 6-month GOSE were drawn from actual patient outcomes in the CENTER-TBI data set. We designated the first category as “degraded cases” and the second category as “improved cases.” We tallied and compared these cases between the developed and a baseline model trained on the CENTER-TBI data set.

Statistical analysis

To assess the prognostic performance between the baseline and the developed models, we calculated a 95% confidence interval (CI) of AUC utilizing a non-parametric bootstrapping method.²⁶ Fifty non-training data points were randomly selected with replacement, and the bootstrapping procedures were repeated 2000 times. In subgroup analysis, we reported 95% CI for both sensitivity and specificity, applying the same bootstrapping method to each subgroup. The bootstrapping CI was computed using Bstrap, a Python package for CIs with the bootstrapping method.

In the discordant analysis, we investigated the distribution of each CT finding in both degraded and improved cases to identify specific CT findings that contributed to prediction discrepancies (degraded vs. improved). This aimed to assess the effect of particular CT findings on prediction discrepancies. A χ² independence test with a null hypothesis of no association between patient CT finding and prediction discrepancy was conducted. CT findings typical among TBI patients were examined, including traumatic subarachnoid hemorrhage, acute subdural hematoma, intraparenchymal hemorrhage, skull fracture, mass lesion, cisternal compression, ventricular compression, intraventricular hemorrhage, midline shift, cortical sulcus effacement, brain herniation, diffuse axonal injury (traumatic axonal injury), and epidural hematoma. The analysis was conducted using R (version 4.2.2).

Results

Patient characteristics

From 4510 patients in the CENTER-TBI data set, we excluded: (1) those missing clinical variables, laboratory variables, or CT scans (n = 3056), (2) those with inadequate CT scans; that is, blank or not encompassing the brain (n = 412), and (3) pediatric patients, meaning those <15 years of age (n = 26), leaving 1016 patients (median age 53.0 [32–68], 69.4% male) for analysis. Included were 49.7% with mild trauma and 35.4% with severe conditions. The supplementary material includes a flow chart depicting the patient selection process (Fig. S1).

By using identical criteria for the CINTER-TBI data set, we included 348 patients (median age 32 [23–45], 80.2% male) for external validation. Favorable outcomes at 6 months were 88.2% in the CINTER-TBI data set, and >63.4% in the CENTER-TBI data set. The CINTER-TBI study did not collect laboratory variables.

A summary of patient characteristics in the CENTER-TBI and CINTER-TBI data sets represents a wide spectrum of TBI (Table 2). A summary of characteristics for all patients, including those excluded from the CENTER-TBI data set, is available in the supplementary material (Table S1). Despite minor disparities, parallel trends were observed in both included and excluded patients regarding their attributes. Table S2 details patient characteristics in the CENTER-TBI data set, including other clinical presentations (GCS values, presence/absence of hypoxia and hypotension), and CT interpretations (Marshall, Rotterdam, Helsinki, lesions in radiological reports) concerning 6-month outcomes.

Table 2.

Patient Characteristics in CENTER- and CINTER-TBI Data Sets

Variable		CENTER-TBI (n = 1,016)	CINTER-TBI (n = 348)
Age, median (IQR)		53 (32–68)	32 (23–45)
Sex, n (%)	Female	311 (30.6)	69 (19.8)
Sex, n (%)	Male	705 (69.4)	279 (80.2)
Pupil reactivity, n (%)	Both pupils	867 (85.3)	324 (93.1)
	One pupil	51 (5.0)	10 (2.9)
	Neither pupil	98 (9.7)	14 (4.0)
Motor score, n (%)	Obeys/localizes (5–6)	672 (66.1)	291 (83.7)
	Normal/abnormal flexion (3–4)	72 (7.1)	27 (7.8)
	Extension/none (1–2)	272 (26.8)	30 (8.6)
Six-month Glasgow Outcome Scale-Extended (GOSE), n (%)	Unfavorable (GOSE ≤ 4)	372 (36.6)	41 (11.8)
Six-month Glasgow Outcome Scale-Extended (GOSE), n (%)	Favorable (GOSE ≥ 5)	644 (63.4)	306 (88.2)
Glasgow Coma Scale (GCS)-based severity, n (%)	Mild (GCS 13–15)	505 (49.7)	204 (58.6)
	Moderate (GCS 9–12)	129 (12.7)	6 (1.7)
	Severe (GCS 3–8)	360 (35.4)	138 (39.7)
	Unknown	22 (2.2)	0 (0.0)
Glucose (mmol/L), median (IQR)		7.33 (6.22–8.94)	N/A N/A
Hemoglobin (g/dL), median (IQR)		13.3 (11.8–14.4)	N/A N/A

Median values are shown with interquartile ranges (IQRs) for continuous variables, and frequencies and percentages are provided for categorical variables.

Prognostic performance using CT scans

To evaluate the capacity of CT interpretation, we initially assessed the performance of existing CT scoring systems and our developed model using only CT scans, without reliance on clinical and/or laboratory variables. The AUC for the previously established CT scoring systems was 0.720 (95% CI: 0.717–0.723), 0.732 (95% CI: 0.729–0.735), 0.762 (95% CI: 0.758–0.766), and 0.775 (95% CI: 0.771–0.779) for Marshall, Rotterdam, IMPACT, and Helsinki, respectively. Our proposed method, employing raw CT scans without the need for manual CT findings, as required by traditional CT scoring systems, achieved an AUC of 0.788 (95% CI: 0.785–0.791), significantly outperforming all existing CT scoring systems. This result indicates that our model effectively extracted imaging features crucial for prognosticating TBI patients, akin to the information provided by manual CT assessments in the CT scoring systems. We will explore the details of the extracted imaging features and the model's interpretability in the Discussion section.

Prognostic performance utilizing clinical and laboratory variables and CT scans

We assessed prognostic performance by integrating clinical and laboratory variables (Table 3). A model relying solely on clinical and laboratory variables had an AUC of 0.817 (95% CI: 0.814–0.820). With the inclusion of Marshall, Rotterdam, IMPACT, and Helsinki CT scores, the AUC increased to 0.829 (95% CI: 0.826–0.832), 0.830 (95% CI: 0.826–0.833), 0.838 (95% CI: 0.835–0.841), and 0.842 (95% CI: 0.839–0.845), respectively. These results align with a previous study,⁹ emphasizing that CT findings enhance AUC when combined with clinical and laboratory variables. Our model employing the augmented early fusion (AEF) technique achieved an AUC of 0.846 (95% CI: 0.843–0.849), significantly outperforming models utilizing Marshall, Rotterdam, and IMPACT, but not significantly differing from the Helsinki-based models. Although our model did not require manual CT analysis, conventional CT scoring systems did. Nevertheless, our proposed algorithm matched or exceeded the performance of existing CT scoring systems in AUC.

Table 3.

Prognostic Performance of Baseline Models and our Developed Models

Approach	Input (predictor)			Internal validation @CENTER-TBI			External validation @CINTER-TBI
Approach	Clinical variables (CV)	Laboratory variables (LV)	CT	AUC	Sensitivity	Specificity	AUC	Sensitivity	Specificity
Baseline type 1 (Variables only)	✓	-	-	0.809	0.732	0.472	0.809	0.703	0.539
Baseline type 1 (Variables only)	✓	✓	-	0.817	0.673	0.762	-	-	-
Baseline type 2 (CT scoring system)	-	-	✓ Marshall	0.720	0.867	0.314	-	-	-
	-	-	✓ Rotterdam	0.732	0.644	0.542	-	-	-
	-	-	✓ IMPACT	0.762	0.664	0.525	-	-	-
	-	-	✓ Helsinki	0.775	0.734	0.592	-	-	-
Ours	-	-	✓ Raw CT scan	0.788	0.719	0.749	0.802	0.769	0.772
Baseline type 3 (CT scoring system + CV)	✓	-	✓ Marshall	0.830	0.785	0.717	-	-	-
	✓	-	✓ Rotterdam	0.822	0.817	0.689	-	-	-
	✓	-	✓ IMPACT	0.832	0.881	0.320	-	-	-
	✓	-	✓ Helsinki	0.836	0.763	0.782	-	-	-
Ours + CV	✓	-	✓ Raw CT scan	0.846	0.826	0.735	0.859	0.708	0.874
Baseline type 4 (CT scoring systems + CV + LV)	✓	✓	✓ Marshall	0.829	0.812	0.627	-	-	-
	✓	✓	✓ Rotterdam	0.830	0.619	0.838	-	-	-
	✓	✓	✓ IMPACT	0.838	0.811	0.490	-	-	-
	✓	✓	✓ Helsinki	0.842	0.795	0.764	-	-	-
Ours + CV + LV	✓	✓	✓ Raw CT scan	0.846	0.774	0.777	-	-	-

The performance of the developed model is shown when augmented early fusion was applied with the optimal parameter (k = 784).

CT, computed tomography; AUC, area under the curve; IMPACT, International Mission for Prognosis and Analysis of Clinical Trials in Traumatic Brain Injury.

When incorporating clinical and laboratory variables with CT scans (Table 3), our developed model exhibited a sensitivity and specificity of 0.774 (95% CI: 0.768–0.781) and 0.777 (95% CI: 0.770–0.783), respectively, significantly surpassing the baseline model utilizing variables alone (0.673 [95% CI: 0.667–0.679] and 0.762 [95% CI: 0.756–0.784]). Enhanced performance was also observed when adding clinical variables (Table 3). These findings demonstrate the effectiveness of our proposed automated TBI prognostication framework.

Subgroup analysis

Evaluation metrics are depicted for severity-based subgroups, using a baseline model incorporating clinical and laboratory variables (Table 4). The developed model's AUC surpassed the baseline model's AUC across all subgroups (0.780 [95% CI: 0.776–0.784] and 0.804 [95% CI: 0.800–0.809] for mild , 0.786 [95% CI: 0.783–0.790] and 0.855 [95% CI: 0.853–0.857] for moderate, and 0.745 [95% CI: 0.742–0.748] and 0.795 [95% CI: 0.792–0.798] for severe cases, respectively). Sensitivity improvement was notably significant in severe cases (0.547 [95% CI: 0.539–0.556] for the baseline and 0.727 [95% CI: 0.722–0.732] for the developed model). In terms of specificity, the developed model significantly outperformed the baseline model in moderate cases (0.517 [95% CI: 0.506–0.527] for the baseline and 0.762 [95% CI: 0.722–0.732] for the developed model) and mild cases (0.676 [95% CI: 0.671–0.682] for the baseline and 0.783 [95% CI: 0.778–0.789] for the developed model).

Table 4.

Prognostic Performance in Relation to Glasgow Coma Scale (GCS)-Based Severities

Evaluation metric		AUC		Sensitivity		Specificity
Model		Baseline	Developed	Baseline	Developed	Baseline	Developed
Severity	Mild (GCS 13–15)	0.780	0.804	0.692	0.740	0.676	0.783
	Moderate (GCS 9–12)	0.786	0.855	0.849	0.855	0.517	0.762
	Severe (GCS 3–8)	0.745	0.795	0.547	0.727	0.807	0.795

The baseline model utilized clinical and laboratory variables, whereas the developed model incorporated these variables and computed tomography scans as predictors. Augmented early fusion was applied to the developed model with the optimal parameter (k = 784).

AUC, area under the curve.

Discordant case analysis

In this analysis, we compared two models that had comparable AUC: one combining clinical and laboratory variables with the Helsinki CT score, and the other employing our algorithm, using CT scans with these variables. Of 186 discordant cases (18.3%), 94 were degraded (9.3%) and 92 were improved cases (9.1%) throughout fivefold cross-validation. Ground truth outcomes were 61 (64.9%) favorable and 33 (35.1%) unfavorable for degraded cases, and 68 (73.9%) favorable and 24 (26.1%) unfavorable for improved cases. A statistically significant association between CT findings and prediction discrepancies was observed when the following CT findings were present: traumatic subarachnoid hemorrhage (p = 0.023), intraparenchymal hemorrhage (p = 0.022), cisternal compression (p < 0.01), ventricular compression (p < 0.01), intraventricular hemorrhage (p = 0.018), midline shift (p = 0.027), brain herniation (p = 0.038), and diffuse axonal injury (p < 0.01). Figure S2 shows distributions of all these CT findings for degraded and improved cases.

External validation

The prognostic ability was evaluated on the CINTER-TBI data set (Table 3) utilizing only clinical variables as laboratory variables were absent. The AUC for the baseline and our model were 0.809 (95% CI: 0.798–0.820) and 0.802 (95% CI: 0.798–0.805) respectively. With the inclusion of CT scans, the AUC increased to 0.859 (95% CI: 0.857–0.862), significantly outperforming the baseline model that utilized only clinical variables. This AUC value (0.859) is even higher than the 0.846 achieved in the CENTER-TBI data set. Specificity was significantly improved; 0.539 (95% CI: 0.527–0.552) for baseline and 0.874 (95% CI: 0.871–0.877) for the developed model. No significant difference was detected between the baseline and developed models in terms of sensitivity (0.703 [95% CI: 0.692–0.713] and 0.708 [95% CI: 0.707–0.708]). These findings corroborate the prognostic performance seen in the model development using the CENTER-TBI data set, verifying the generalizability of our algorithm.

Multi-class GOSE prediction

Up to this point, our investigation of TBI prognostication has focused on binary classification; specifically, distinguishing between unfavorable (GOSE = 1–4) and favorable (GOSE = 5–8) outcomes. Nevertheless, the original purpose of the GOSE scale was to categorize patient outcomes into eight distinct categories: (1) dead, (2) vegetative state, (3) lower severe disability, (4) upper severe disability, (5) lower moderate disability, (6) upper moderate disability, (7) lower good recovery, and (8) upper good recovery.¹ Consequently, in this subsection, we extend our proposed algorithm to multi-class classification tasks and explore multi-class GOSE prognostic models.

Initially, we performed a four-class GOSE prediction using the CENTER-TBI data set, with the following categories: dead (GOSE = 1), vegetative state or severe disability (GOSE = 2–4), moderate disability (GOSE = 5–6), and good recovery (GOSE = 7–8). Subsequently, we trained and evaluated a seven-class GOSE prediction; GOSE = 1, GOSE = 2 or 3, GOSE = 4, … , GOSE = 8. We note that the original CENTER-TBI data set did not distinguish between GOSE = 2 and 3, so these classes were merged. The experimental settings remained the same as for binary classification, with the exception that the binary classification head of TimeSformer with a sigmoid activation function was replaced by a multi-class head with a softmax activation function.

The AUC for a baseline model using clinical and laboratory variables and the developed model was 0.713 and 0.758 for the four-class task, and 0.651 and 0.688 for the seven-class prediction, respectively.

Discussion

In this study, we developed a novel ML-based prognostic model that exhibits superior predictive ability for 6-month outcomes compared with existing models utilizing CT scores. Our method's enhanced prognostic performance was further corroborated through external validation. Unlike previous studies, our model removes the need for manual CT assessments, possibly mitigating accessor bias.²⁷ The development of clinical decision support software founded on our algorithm might help clinicians to promptly and accurately prognosticate TBI patients, regardless of clinician's expertise level, thus contributing to timely treatment and enhanced patient outcomes.

Predictive ability

The predictive ability of our proposed model surpasses most baseline models in AUC, even without manual assessments (Table 3). Notably, this enhanced prognostic performance was preserved during external validation, suggesting that our algorithm is applicable to a diverse population of TBI patients. The AUC of the developed model (0.859) in external validation exceeded the AUC of 0.846 from internal validation using the CENTER-TBI data set. This improvement is noteworthy, as the predictive ability of ML models often decreases during external validation. Such increased generalizability likely arises from training on the CENTER-TBI data set, comprising data from >60 centers across Europe and encompassing a broad range of CT imaging conditions. Therefore, the model trained on CENTER-TBI data is expected to learn diverse imaging features that can generalize to CT scans in other clinical facilities, including those in which the CINTER-TBI study was conducted.

Our model demonstrated a significant improvement in sensitivity within the severe subgroup compared with the baseline model (Table 4). This improvement may be the result of critical CT findings in severe cases that characterize patient outcomes. The developed algorithm appeared to focus more on these findings in severe cases, enhancing severity in this group. Such improvement could reduce treatment failures among less-seasoned trainees or clinicians not specializing in TBI by minimizing the likelihood of overlooking severe patients who may experience unfavorable outcomes. Additionally, a substantial increase in specificity was noted among mild and moderate cases, potentially leading to reduced unnecessary clinical intervention for those expected to have favorable outcomes.

Both the Helsinki-based and our developed models displayed comparable AUC using clinical and laboratory variables. The discordant analysis indicated that prediction discrepancies between these models correlated with certain CT findings, contributing to more improved cases. Notably, many of these CT findings, including traumatic subarachnoid hemorrhage, intraparenchymal hemorrhage, and midline shift, were not components of the Helsinki CT score. This suggests that our model enhanced prognostic prediction when Helsinki CT score components were absent. Conversely, when some CT findings in the Helsinki CT score, such as subdural hematoma, mass lesion, and epidural hematoma, were present, the association between these findings and prediction discrepancy was not statistically meaningful. It implies that the performance of our proposed model and the Helsinki-based model were equivalent when Helsinki's components were present, showing that our approach, focusing on raw CT imaging, considers not only the Helsinki CT score's components but also various TBI-related CT findings.

Visualization of extracted imaging features

The lower AUC in multi-class predictions, compared with binary class prediction, likely stem from the blurred boundaries within the GOSE scale. To investigate this, imaging features were extracted from one of the middle layers in the pre-trained TimeSformer model (Fig. 1). We utilized UMAP,²⁸ a well-known non-linear dimension reduction algorithm, to visualize the multi-dimensional extracted features on a 2D plane. Two hyper-parameters in UMAP, local neighborhood size and minimum distance apart, were set at 15 and 0.1, respectively. If the ground truth GOSE had clear boundaries among each class, several clusters would be identifiable. However, no distinct clusters were observed for four- and seven-class predictions, unlike the binary classification task (Fig. 1). This indicates that the weaker prognostic performance in multi-class prediction is not the result of our proposed ML-driven approach's limitations, but rather of the blurred boundaries inherent in the GOSE scale, an imperfection noted in prior studies.^21,29 Therefore, binary classification seems a reasonable approach as long as ML uses GOSE as ground truth data.

FIG. 1.

Two-dimensional representation of imaging features extracted from a pre-trained TimeSformer model. (a) binary class classification, (b) four-class classification, (c) seven-class classification.

Visualization through saliency map

We present a saliency map obtained by applying our developed model to the CENTER-TBI data set (Fig. 2). The saliency map delineates the degree of influence on prediction, signifying the areas that the ML model pays attention to during the prediction phase. We employed the attention rollout scheme,³⁰ a technique for quantifying information flow by utilizing attention weights in 3D ViT, to visualize the regions that our model focused on during prediction. Figure displays the saliency map for a CT slice without any TBI-related abnormalities, indicating that the model primarily attended to the interior of the brain uniformly, whereas areas outside the brain exhibited minimal association with decision making. Figure 2b and c depict the saliency maps for CT slices with skull fracture and intraparenchymal hemorrhage, respectively: typical abnormalities observed in CT scans of TBI patients.¹ These findings suggest that our model was trained to prioritize attention on TBI-related lesions over normal brain regions during the prognostication process, akin to how clinicians perform manual assessments.¹

FIG. 2.

Visualization of the regions where the developed model allocated attention. Left: input computed tomography (CT) slices with (a) no traumatic brain injury (TBI)-related abnormalities, (b) skull fracture, and (c) intraparenchymal hemorrhage. Skull fracture and intraparenchymal hemorrhage are highlighted by rectangles. Right: saliency map, where red-colored regions indicate areas that received greater attention from the developed model compared with blue-colored regions.

The CENTER-TBI data set's diversity, featuring scans from >60 centers, allowed our model to learn from a wide range of imaging conditions, including different scanners and quality variations. This broad learning base suggests that the saliency map generated by our model can be more generalizable than those from models trained on more homogeneous data sets. However, it is important to note that the saliency map's role is to illustrate our model's reasoning for providing explainability, not for clinical decision making. The map simply highlights the model's focus on relevant brain regions and avoidance of non-pertinent areas, underscoring the effectiveness of our training process.

Related studies

Several significant prior studies on TBI prognostication are relevant to our work. Yao and coworkers¹² proposed an ML-based approach for predicting mortality among TBI patients, employing imaging features related to the shape and volume of segmented hematoma as predictors. However, this method required hematoma segmentation before mortality prediction and used hand-crafted features, thereby limiting the number of contributing imaging features. In contrast, our model offers a fully end-to-end prediction of long-term outcomes, enabling prognostication to be directly computed from CT scans and clinical variables. Moreover, as the extracted features are automatically acquired during model training, a greater number of valuable imaging features can be expected to contribute to prognostication. Pease and coworkers³¹ constructed an ML-based prognostic model utilizing CT images and clinical variables. Although their study appears analogous to our multimodal approach, key differences are evident in the requirement for manual interpretations. Pease's methodology necessitates expert manual selection of representative CT slices for input data. In contrast to this earlier work, our model eliminates the need for manual CT evaluations, facilitating prompt prognostication of TBI patients by clinicians, irrespective of their expertise level.

Another important study by Tehrani and coworkers³² proposed a method for COVID-19 mortality prediction using 3D ViT with inputs of CT scans and clinical variables. They generated an imaginary 2D image comprising clinical variables and positioned it after the final CT slice as an additional channel. However, this approach lacks clinical validity in treating clinical variables as a spatially continuous CT slice. In contrast, our study merged clinical variables with an imaging feature vector using the AEF approach, a method consistent with previous medical literature⁹ in fusing multimodal medical data.

Limitations

This study presents several limitations. First, a restricted set of clinical and laboratory variables were utilized as predictors, including core clinical variables (age, pupil reactivity, and motor score) and laboratory variables (glucose and hemoglobin). These were chosen because a prior study of TBI prognostication⁸ showed that they have critical importance. Incorporating a broader array of predictors, such as biomarkers, pathological data, and genomic information, could enhance performance and enable more accurate identification of patients with favorable versus unfavorable prognoses.

Second, the data set employed for external validation was not fully balanced with respect to long-term outcomes and age distribution. Approximately 88% of patients in the CINTER-TBI data set exhibited favorable outcomes, with younger patients predominantly represented (Table 2). Further external validation using balanced data would reinforce the assessment of our proposed algorithm's applicability and generalizability. An additional limitation is that not all layers of our developed model were trained to update internal weights. We adopted a computationally efficient 3D ViT architecture and fine-tuned it to accommodate large-capacity CT data. Although large computational resources, such as >30 GPUs, may be recommended,¹⁴ training the entire TimeSformer architecture could capture more imaging features characterizing TBI patients' long-term outcomes. This enhancement may also improve saliency maps, as there were instances in which the developed model did not fully consider presenting hematomas.

Future research directions

Our study employed a dichotomized GOSE prediction with a fixed threshold (cutoff point) among the eight recovery levels, but lacks empirical justification for an ideal GOSE threshold.²¹ Further, because the current GOSE's eight classes may have subtle and overlapping boundaries, dichotomization could potentially discard valuable information on varying levels of patient recovery.²⁹ This notion is supported by our observations (Fig. 1), which showed that extracted imaging features from patients with different GOSE scores were not distinctly clustered. Investigating a new ordinal measure with clear boundaries for describing long-term outcomes could contribute to more reliable TBI prognostic models. This study primarily focuses on CT imaging for TBI prognostication, yet other modalities, such as MRI, are also important in assessing patient outcomes. Investigating the inclusion of multiple imaging types is a promising avenue for future research.

Conclusion

This study developed an innovative TBI prognostic model that significantly outperforms many existing models in predicting 6-month outcomes. The effectiveness of our proposed method was further substantiated through external validation. Unlike previous models that require manual CT assessments and clinical expertise, our developed model autonomously extracts critical imaging features and estimates patient outcomes regardless of the assessor's experience. By delivering rapid and reliable prognostications, the developed model holds potential to support clinical decision-making processes related to the treatment of TBI patients.

Transparency, Rigor, and Reproducibility Summary

We conducted a retrospective analysis on the CENTER-TBI dataset to develop an ML-based binary classifier for predicting 6-month post-injury outcomes, using a sample size of 1016 participants from the primary data set and 348 participants from the CINTER-TBI data set for external validation. The prognostic performance, assessed over fivefold cross-validation, demonstrated our ML model's superiority with an AUC of 0.846 (95% CI: 0.843–0.849), further confirmed in external validation with an AUC of 0.859 (95% CI: 0.857–0.862). CENTER-TBI's data sharing policy, detailed at CENTER-TBI's Web site, ensures post-publication accessibility of individual participant data to researchers with approved study proposals, adhering to the original study's regulatory restrictions.

Footnotes

Acknowledgments

We gratefully thank all CENTER-TBI and CINTER-TBI participants and investigators.

CENTER-TBI and CINTER-TBI Participants and Investigators

Cecilia Åkerlund, Krisztina Amrein, Nada Andelic, Lasse Andreassen, Audny Anke, Anna Antoni, Gérard Audibert, Philippe Azouvi, Maria Luisa Azzolini, Ronald Bartels, Pál Barzó, Romuald Beauvais, Ronny Beer, Bo-Michael Bellander, Antonio Belli, Habib Benali, Maurizio Berardino, Luigi Beretta, Morten Blaabjerg, Peter Bragge, Alexandra Brazinova, Vibeke Brinck, Joanne Brooker, Camilla Brorsson, Andras Buki, Monika Bullinger, Manuel Cabeleira, Alessio Caccioppola, Emiliana Calappi, Maria Rosa Calvi, Peter Cameron, Guillermo Carbayo Lozano, Marco Carbonara, Simona Cavallo, Giorgio Chevallard, Arturo Chieregato, Ajay Choudhary, Giuseppe Citerio, Hans Clusmann, Mark Coburn, Jonathan Coles, Jamie D. Cooper, Marta Correia, Amra Čović, Nicola Curry, Endre Czeiter, Marek Czosnyka, Claire Dahyot-Fizelier, Paul Dark, Helen Dawes, Vincent Degos, Véronique De Keyser, Francesco Della Corte, Hugo den Boogert, Bart Depreitere, Đula Đilvesi, Abhishek Dixit, Emma Donoghue, Jens Dreier, Guy-Loup Dulière, Ari Ercole, Patrick Esser, Erzsébet Ezer, Martin Fabricius, Valery L. Feigin, Kelly Foks, Shirin Frisvold, Alex Furmanov, Pablo Gagliardo, Damien Galanaud, Dashiell Gantner, Guoyi Gao, Pradeep George, Alexandre Ghuysen, Lelde Giga, Ben Glocker, Jagoš Golubovic, Pedro A. Gomez , Johannes Gratz, Benjamin Gravesteijn, Francesca Grossi, Russell L. Gruen, Deepak Gupta, Juanita A. Haagsma, Iain Haitsma, Raimund Helbok, Eirik Helseth, Lindsay Horton, Jilske Huijben, Peter J. Hutchinson, Bram Jacobs, Stefan Jankowski, Mike Jarrett, Ji-yao Jiang, Faye Johnson, Kelly Jones, Mladen Karan, Angelos G. Kolias, Erwin Kompanje, Daniel Kondziella, Evgenios Kornaropoulos, Lars-Owe Koskinen, Noémi Kovács, Ana Kowark, Alfonso Lagares, Linda Lanyon, Steven Laureys, Fiona Lecky, Didier Ledoux, Rolf Lefering, Valerie Legrand, Aurelie Lejeune, Leon Levi, Roger Lightfoot, Hester Lingsma, Andrew I.R. Maas, Ana M. Castaño-León, Marc Maegele, Marek Majdan, Alex Manara, Geoffrey Manley, Costanza Martino, Hugues Maréchal, Julia Mattern, Catherine McMahon, Béla Melegh, David Menon, Tomas Menovsky, Ana Mikolic, Benoit Misset, Visakh Muraleedharan, Lynnette Murray, Ancuta Negru, David Nelson, Virginia Newcombe, Daan Nieboer, József Nyirádi, Otesile Olubukola⁷, Matej Oresic, Fabrizio Ortolano, Aarno Palotie, Paul M. Parizel, Jean-François Payen, Natascha Perera, Vincent Perlbarg, Paolo Persona, Wilco Peul, Anna Piippo-Karjalainen, Matti Pirinen, Dana Pisica, Horia Ples, Suzanne Polinder, Inigo Pomposo, Jussi P. Posti, Louis Puybasset, Andreea Radoi, Arminas Ragauskas, Rahul Raj, Malinka Rambadagalla, Isabel Retel Helmrich, Jonathan Rhodes, Sylvia Richardson, Sophie Richter, Samuli Ripatti, Saulius Rocka, Cecilie Roe, Olav Roise, Jonathan Rosand, Jeffrey V. Rosenfeld, Christina Rosenlund, Guy Rosenthal, Rolf Rossaint, Sandra Rossi, Daniel Rueckert, Martin Rusnák, Juan Sahuquillo, Oliver Sakowitz, Renan Sanchez-Porras, Janos Sandor, Nadine Schäfer, Silke Schmidt, Herbert Schoechl, Guus Schoonman, Rico Frederik Schou, Elisabeth Schwendenwein, Charlie Sewalt, Ranjit D. Singh, V. D. Sinha,Toril Skandsen, Peter Smielewski, Abayomi Sorinola, Emmanuel Stamatakis, Simon Stanworth, Robert Stevens, William Stewart, Ewout W. Steyerberg, Nino Stocchetti, Nina Sundström, Riikka Takala, Viktória Tamás, Tomas Tamosuitis, Mark Steven Taylor, Aurore Thibaut, Braden Te Ao, Olli Tenovuo, Alice Theadom, Matt Thomas, Dick Tibboel, Marjolein Timmers, Christos Tolias, Tony Trapani, Cristina Maria Tudora, Andreas Unterberg, Peter Vajkoczy, Shirley Vallance, Egils Valeinis, Zoltán Vámos, Mathieu van der Jagt, Joukje van der Naalt, Gregory Van der Steen, Thijs Vande Vyvere, Jeroen T.J.M. van Dijck, Inge A. M. van Erp, Thomas A. van Essen, Wim Van Hecke, Caroline van Heugten, Ernest van Veen, Roel P. J. van Wijk, Alessia Vargiolu, Emmanuel Vega, Kimberley Velt, Jan Verheyden, Paul M. Vespa, Anne Vik, Rimantas Vilcinis, Victor Volovici, Nicole von Steinbüchel, Daphne Voormolen, Petar Vulekovic, Kevin K.W. Wang, Daniel Whitehouse, Eveline Wiegers, Guy Williams, Lindsay Wilson, Stefan Winzeck, Stefan Wolf, Zhihui Yang, Peter Ylén, Alexander Younsi, Frederick A. Zeiler, Veronika Zelinkova, Agate Ziverte, and Tommaso Zoerle

Ethical Approval

This study was approved by the Unity Health Toronto Research Ethics Board (approval number: 23-161). The CENTER-TBI study (EC grant 602150) has been conducted in accordance with all relevant laws of the European Union if directly applicable or of direct effect, and with all relevant laws of the country where the recruiting sites were located, including but not limited to, the relevant privacy and data protection laws and regulations (the “Privacy Law”), the relevant laws and regulations on the use of human materials, and all relevant guidance relating to clinical studies from time to time in force including, but not limited to, the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use Harmonised Tripartite Guideline for Good Clinical Practice (CPMP/ICH/135/95) (“ICH GCP”) and the World Medical Association Declaration of Helsinki entitled “Ethical Principles for Medical Research Involving Human Subjects.” Informed consent by the patient and/or the legal representative/next of kin was obtained, according to the local legislations, for all patients recruited in the core data set of CENTER-TBI and documented in the electronic case report forms (e-CRF). Ethical approval was obtained for each recruiting site. The list of sites, ethical committees, approval numbers and approval dates can be found online.³³

Authors' Contributions

A.H. and P.N.T. contributed to the study conception and design. A.H. implemented the algorithm and analyzed experimental results. A.H. wrote the first draft of the manuscript. M.D.C., A.B., and R.G.K. interpreted data and critically revised the manuscript. All authors read and approved the final manuscript.

Funding Information

Data used in preparation of this manuscript were obtained in the context of CENTER-TBI, a large collaborative project with the support of the European Union 7th Framework program (EC grant 602150). Additional funding was obtained from the Hannelore Kohl Stiftung (Germany), OneMind (USA), and Integra LifeSciences Corporation (USA). None of these funding sources for CENTER-TBI had any role in the writing of the manuscript or the decision to submit it for publication. This work was supported by a research grant from Nippon Steel Corporation (fund number 509533).

Data Sharing

CENTER-TBI encourages data sharing, and there is a data sharing statement published: https://center-tbi.eu/data/sharing. Individual participant data, including data dictionary and analysis scripts will be available immediately following publication, conditional to approved study proposal. Data will be available to researchers who provide a methodologically sound study proposal that is approved by the management committee to achieve the aims in the approved proposal. Proposals can be submitted online at https://www.center-tbi.eu/data. A data access agreement is required, and all access must comply with regulatory restrictions imposed on the original study. The code supporting this study will become publicly accessible following its publication at .

Author Disclosure Statement

A.H. is supported by a PhD funding award from Nippon Steel Corporation. A.B. is a Co-Founder & Co-CEO of 16 Bit Inc., and a member of the Scientific Advisory Committee of Osteoporosis Canada. R.G.K. is on the Scientific Advisory Board of Iterative Scopes. P.N.T. is an investigator and consultant of Novo Nordisk, an officer, director and shareholder of SofTx Innovations Inc., an advisory board member of Demeter Innovation Lab, and an advisory board member of Pulsar Music Inc. M.D.C. is a recipient of grants from the Canadian Institutes of Health Research (grant number 471164, 471342, and 486664). The authors report no conflict of interest concerning the materials or methods used in this study or the findings specified in this article.

Supplementary Material

Supplementary Figure S1

Supplementary Figure S2

Supplementary Table S1

Supplementary Table S2

Supplementary Document S1

Supplementary Document S2

References

Silver

, McAllister

, Arciniegas

. Textbook of Traumatic Brain Injury, Third Edition. American Psychiatric Association Publishing: Washington, DC; 2018.

Rubiano

, Carney

, Chesnut

, et al. Global neurotrauma research challenges and opportunities. Nature, 2015; 527(7578):S193–S197; doi: 10.1038/nature16035

Lee

, Newberg

. Neuroimaging in traumatic brain imaging. NeuroRx, 2005; 2(2):372–383; doi: 10.1602/neurorx.2.2.372

Wysoki

, Nassar

, Koenigsberg

, et al. Head trauma: CT scan interpretation by radiology residents versus staff radiologists. Radiology, 1998; 208(1):125–128; doi: 10.1148/radiology.208.1.964680

Sarigul

, Bell

, Chesnut

, et al. Prognostication and Goals of Care Decisions in Severe Traumatic Brain Injury: A Survey of The Seattle International Severe Traumatic Brain Injury Consensus Conference Working Group. J Neurotrauma, 2023; 40(15-16):1707–1717; doi: 10.1089/neu.2022.0414

Marshall

, Marshall

, Klauber

, et al. A new classification of head injury based on computerized tomography. J Neurosurg, 1991; 75(SUPPL.):S14–S20; doi: 10.3171/sup.1991.75.1s.0s14

Maas

AIR

, Hukkelhoven

CWPM

, Marshall

, et al. Prediction of outcome in traumatic brain injury with computed tomographic characteristics: a comparison between the computed tomographic classification and combinations of computed tomographic predictors. Neurosurgery, 2005; 57(6):1173–1181; doi: 10.1227/01.NEU.0000186013.63046.6B

Steyerberg

, Mushkudiani

, Perel

, et al. Predicting outcome after traumatic brain injury: Development and international validation of prognostic scores based on admission characteristics. PLoS Med, 2008; 5(8):1251–1261; doi: 10.1371/journal.pmed.0050165

Raj

, Siironen

, Skrifvars

, et al. Predicting outcome in traumatic brain injury: development of a novel computerized tomography classification system (Helsinki Computerized Tomography Score). Neurosurgery, 2014; 75(6):632–647; doi: 10.1227/NEU.0000000000000533

10.

Brown

, Lim

, Gannon

, et al. Machine-learning analysis outperforms conventional statistical models and CT classification systems in predicting 6-month outcomes in pediatric patients sustaining traumatic brain injury. Neurosurg Focus, 2018; 45(5):E2; doi: https://dx.doi.org/10.3171/2018.8.FOCUS17773

11.

Bolvardi

, Ahmadzadeh

, Disfani

, et al. Predicting outcomes 3 months after traumatic brain injury in patients admitted to emergency department. UNIVERSA Med, 2019; 38(3):147–155; doi: 10.1805/1/UnivMed.2019.v38.147-155

12.

Yao

, Williamson

, Gryak

, et al. Automated hematoma segmentation and outcome prediction for patients with traumatic brain injury. Artif Intell Med, 2020; 107:101910; doi: 10.1016/j.artmed.2020.101910

13.

Dosovitskiy

, Beyer

, Kolesnikov

, et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. [Online] In: Ninth International Conference on Learning Representations (ICLR 2021) 2021.

14.

Bertasius

, Wang

, Torresani

. Is space-time attention all you need for video understanding?. Proc 38th Int Conf Mach Learn, 2021; 139:813–824.

15.

, Gan

, Li

, et al. Transformers in medical image analysis: a review. Intel Med, 2023; 3(1):59–78; doi: 10.1016/j.imed.2022.07.002

16.

Collins

, Reitsma

, Altman

, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med, 2015; 162(1):55–63; doi: 10.7326/M14-0697

17.

Maas

AIR

, Menon

, Steyerberg

, et al. Collaborative European neurotrauma effectiveness research in traumatic brain injury (CENTER-TBI): a prospective longitudinal observational study. Neurosurgery, 2015; 76(1):67–80; doi: 10.1227/NEU.0000000000000575

18.

Steyerberg

, Wiegers

, Sewalt

, et al. Case-mix, care pathways, and outcomes in patients with traumatic brain injury in CENTER-TBI: a European prospective, multicentre, longitudinal, cohort study. Lancet Neurol, 2019; 18(10):923–934; doi: 10.1016/S1474-4422(19)30232-7

19.

Jennett

, Snoek

, Bond

, et al. Disability after severe head injury: observations on the use of the Glasgow Outcome Scale. J Neurol Neurosurg Psychiatry, 1981; 44(4):285–293; doi: 10.1136/jnnp.44.4.285

20.

Farazneh

, Williamson

, Gryak

, et al. A hierarchical expert-guided machine learning framework for clinical decision support systems: an application to traumatic brain injury prognostication. NPJ Digit Med, 2021; 4(1):78; doi: https://dx.doi.org/10.1038/s41746-021-00445-0

21.

Zuckerman

, Giacino

, Bodien

. Traumatic brain injury: What Is a Favorable outcome?. J Neurotrauma, 2021; 1012(July):1010–1012; doi: 10.1089/neu.2021.0356

22.

Huang

, Pareek

, Zamanian

, et al. Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection. Sci Rep, 2020; 10(1):1–9; doi: 10.1038/s41598-020-78888-w

23.

Youden

. Index for rating diagnostic tests. Cancer, 1950; 3(1):32–35; doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3

24.

Ranganathan

, Gribskov

, Nakai

, et al., (eds). Encyclopedia of Bioinformatics and Computational Biology. Elsevier: Amsterdam, The Netherlands; 2019.

25.

Ruopp

, Perkins

, Whitcomb

, et al. Youden Index and optimal cut-point estimated from observations affected by a lower limit of detection. Biom J, 2008; 50(3):419–430; doi: 10.1002/bimj.200710415

26.

Wasserman

The Bootstrap BT - All of Statistics: A Concise Course in Statistical Inference. Springer: New York; 2004.

27.

Busby

, Courtier

, Glastonbury

. Bias in radiology: the how and why of misses and misinterpretations. Radiographics, 2018; 38(1):236–247; doi: 10.1148/rg.2018170107

28.

McInnes

, Healy

, Melville

. UMAP: Uniform manifold approximation and projection for dimension reduction. J Open Source Softw, 2018; 3(29):1–2; doi: 10.2110/5/joss.00861

29.

Bhattacharyay

, Milosevic

, Wilson

, et al. The leap to ordinal: detailed functional prognosis after traumatic brain injury with a flexible modelling approach. PLoS One, 2022; 17(7):1–29; doi: 10.1371/journal.pone.0270973

30.

Abnar

, Zuidema

. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics [Online]. July 5–July 10 Quantifying attention flow in transformers. 2020; 4190–4197; doi: 10.1865/3/v1/2020.acl-main.385

31.

Pease

, Arefan

, Barber

, et al. Outcome prediction in patients with severe traumatic brain injury using deep learning from head CT scans. Radiology, 2022; 304(2):385–394; doi: 10.1148/radiol.212181

32.

Tehrani

SSM

, Zarvani

, Amiri

, et al. Visual transformer and deep CNN prediction of high-risk COVID-19 infected patients using fusion of CT images and clinical data. BMC Med Inform Decis Mak, 2023; 17;23(1):265; doi: 10.1101/2022.07.26.22278084

33.

CENTER-TBI. Ethical Approval. n.d. Available from: https://www.center-tbi.eu/project/ethical-approval [Last accessed: 3/6/2022].

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB

0.07 MB

0.11 MB

0.15 MB

0.13 MB

Development of a Multimodal Machine Learning-Based Prognostication Model for Traumatic Brain Injury Using Clinical Data and Computed Tomography Scans: A CENTER-TBI and CINTER-TBI Study

Abstract

Introduction

Methods

Study population and data set

CENTER-TBI data set

CINTER -TBI data set

Outcome scale

Baseline models

Algorithm design

Algorithm evaluation

Subgroup analysis

Discordant case analysis

Statistical analysis

Results

Patient characteristics

Prognostic performance using CT scans

Prognostic performance utilizing clinical and laboratory variables and CT scans

Subgroup analysis

Discordant case analysis

External validation

Multi-class GOSE prediction

Discussion

Predictive ability

Visualization of extracted imaging features

Visualization through saliency map

Related studies

Limitations

Future research directions

Conclusion

Transparency, Rigor, and Reproducibility Summary

Footnotes

Acknowledgments

CENTER-TBI and CINTER-TBI Participants and Investigators

Ethical Approval

Authors' Contributions

Funding Information

Data Sharing

Author Disclosure Statement

Supplementary Material

References

Supplementary Material