Abstract
Accurate diagnosis of traumatic brain injury (TBI) is critical to effective management and intervention, but can be challenging in patients with mild TBI. A substantial number of studies have reported the use of circulating biomarkers as signatures for TBI, capable of improving diagnostic accuracy and clinical decision making beyond current practice standards.
We performed a systematic review and meta-analysis to comprehensively and critically evaluate the existing body of evidence for the use of blood protein biomarkers (S100 calcium binding protein B [S100B], glial fibrillary acidic protein [GFAP], neuron specific enolase [NSE], ubiquitin C-terminal hydrolase-L1 [UCH-L1]. tau, and neurofilament proteins) for diagnosis of intracranial lesions on CT following mild TBI. Effects of potential confounding factors and differential diagnostic performance of the included markers were explored. Further, appropriateness of study design, analysis, quality, and demonstration of clinical utility were assessed. Studies published up to October 2016 were identified through searches of MEDLINE®, Embase, EBM Reviews, the Cochrane Library, World Health Organization (WHO), International Clinical Trials Registry Platform (ICTRP), and
Editor's Note: This article is published as a Living Systematic Review. All Living Systematic Reviews will be updated at approximately three-six month intervals, with these updates published as supplementary material in the online version of the Journal of Neurotrauma (see Update).
Introduction
T
The need to manage patients with possible mTBI more effectively and efficiently–to reduce unnecessary CT scans and medical costs, while not compromising patient care and safety–has driven the quest for sensitive blood-based markers as objective parameters that can be easily and rapidly measured in the systemic circulation. Identification of biomarker signatures associated with distinct aspects of TBI pathophysiology may be also of clinical value for a more accurate characterization and risk stratification of TBI, thereby optimizing medical decision making and facilitating individualized and targeted therapeutic intervention. As such, over the past decades, a focused effort has been made to identify novel blood biomarkers for TBI, and a growing number of candidates has been described and proposed, 5 –8 leading to the recent incorporation of S100B into the Scandinavian Neurotrauma Guidelines. 9 Nonetheless at present, the role of body fluid biomarkers in TBI is primarily relegated to research studies, and the provision of high quality evidence is paramount to meet regulatory requirements and support their adoption and routine use in clinical practice.
Meta-analysis can exploit the quantity of data collected in separate studies and provide the statistical power to assess more precise estimates of sensitivity and specificity, to determine influence of potential confounding factors on the biomarker diagnostic performance, and to detect differences in the accuracy of different marker tests. Hence, we conducted a systematic review and meta-analysis to comprehensively summarize and critically evaluate the existing body of evidence for the use of blood protein biomarkers for diagnosis of brain injury as assessed by CT in adult patients presenting to the ED after mild head trauma.
We focused on markers for which promising scientific evidence of analytical and clinical validity is available and which therefore, are likely to be rapidly transferable to clinical practice; namely, S100 calcium binding protein B (S100B), glial fibrillary acidic protein (GFAP), neuron specific enolase (NSE), ubiquitin C-terminal hydrolase-L1 (UCH-L1), and tau and neurofilament proteins. As TBI biomarker research and technological and analytical advances are dynamic, we felt that a living systematic review–a high quality, online review that is updated as new research becomes available 10 –would best fit our purpose. The “living” nature of such work will permit the potential inclusions and investigation of novel markers, marker combinations, and more refined diagnostic time windows for which relevant scientific literature/body of evidence will be gained.
Methods
This review is being prepared as a “living systematic review,” initiated in the context of the CENTER-TBI project (
Information sources
We searched Ovid MEDLINE® (1946 to October 2016), OVID Embase (1980 to October 2016), OVID Evidence-Based Medicine (EBM) Reviews (October 2016) and Cochrane Library (October 2016) for relevant studies. The search strategies used can be found in the supplementary Appendix (see online supplementary material at
For possible ongoing trials and studies, we searched the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) (searched November 2016) and
Additional studies were identified by reviewing the reference lists of published clinical trials and relevant narratives as well as systematic reviews. Abstracts from relevant scientific meetings were also examined, and experts in the field were consulted for any further studies.
Citations were uploaded into a web-based systematic review program (Covidence, Alfred Health Melbourne, Australia) (
Study selection
Two reviewers independently reviewed the title and abstract of each citation identified by the search strategy. In the second stage, the full text was reviewed and eligible studies selected. Any disagreement between the two authors was resolved through discussion, or where necessary, arbitration by a third party. Studies were included if the article met the prespecified list of eligibility criteria: studies enrolling adult patients presenting to the ED with a history of possible brain injury complying with any authors' definition of mTBI; report of the admission head CT findings; at least one quantitative measurement of the circulating biomarkers of interest (S100B, GFAP, NSE, UCH-L1, tau, and neurofilament proteins) on admission; and relevant accuracy data.
We included studies containing mixed populations; that is, participants with moderate and severe TBI (Glasgow Coma Score [GCS] <13) or pediatric populations. Studies were included irrespective of their geographic location and language of publication. We excluded studies using non-quantitative methods to assess biomarker concentrations (e.g., Western blot or explorative proteomics). Studies with small cohorts (< 50 participants) were excluded, given the high likelihood of their being underpowered and therefore impacting the reliability of findings.
Data extraction and assessment of methodological quality
Two reviewers independently extracted data using a standardized data abstraction form. We abstracted relevant information related to the study design, patient characteristics (demographic and clinical data, including indices of injury severity, presence of extracerebral injuries and polytrauma, and CT findings) and biomarker characteristics (concentrations, sampling time, cutoffs, and statistical levels of diagnostic accuracy [sensitivity and specificity]), analytical aspects of biomarker testing, and study limitations. Details regarding the definition of mTBI and CT abnormality were also extracted.
In the case of multiple studies from the same research group, authors were contacted to ensure that there was no overlap in patient populations. We also contacted authors for clarification of study sample, missing data, or ambiguity in the cutoffs used. If biomarker measurements were taken at multiple time points, we used the sample on admission for analysis.
The methodological quality of the included studies was independently assessed by two reviewers using a modified version of the tool for quality assessment of studies of diagnostic accuracy included in systematic reviews (QUADAS-2), 14 as recommended by the Cochrane Collaboration. Discrepancies were resolved through discussion or arbitration by a third reviewer.
Statistical analysis and data synthesis
The analysis includes a structured narrative synthesis. We constructed evidentiary tables identifying the results pertinent to diagnostic capabilities of the different biomarkers (detection of intracranial lesions as assessed by CT) and study characteristics for all included studies. We conducted exploratory analyses by plotting estimates of sensitivity and specificity from each study on forest plots and in receiver operating characteristic (ROC) space.
Where adequate data were available, we performed meta-analyses for each biomarker, to summarize data and obtain more precise estimates of diagnostic performance. For studies with diverse thresholds, we meta-analyzed pairs of sensitivity and specificity using the hierarchical summary ROC (HSROC) model, which allows for the possibility of variation in threshold between studies, and also accounts for variation among studies and any potential correlation between sensitivity and specificity. 15 For these analyses, we used the NLMIXED procedure in SAS software (version 9.4; SAS Institute 2011, Cary, NC). For studies that reported data at common prespecified cutoff values, we calculated the pooled estimates of sensitivity and specificity (clinically interpretable), by undertaking a random effects bivariate regression approach. 16
We explored heterogeneity through visual examination of the forest plot and the SROC plot for each biomarker. However, as there were insufficient studies, lack of individual data, and/or important variation across studies with simultaneous presence of factors with potentially diverging effects on biomarker accuracy estimates, we did not perform meta-regression (by including each potential source of heterogeneity as a covariate in the bivariate model) as planned.
Sensitivity analyses were performed to check the robustness of the results. We used Cook's distance to identify particularly influential studies, and checked for outliers using scatter plots of the standardized predicted random effects. Then, the robustness of the results was checked by refitting the model excluding any outliers and very influential studies. Sensitivity analyses were also conducted to investigate the impact on biomarker performance of studies including mixed populations, bias in the selection of participants, high prevalence of abnormal CT findings, and different definitions of TBI as assessed by CT.
Data processing and statistical analyses were conducted using Review Manager version 5.3 (Cochrane Collaboration, Copenhagen, Denmark) and STATA version 13.0 (StataCorp, Colleage Station, TX) including the user written commands METANDI and MIDAS.
Quality of the evidence
The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) 17 approach was used to assess the overall quality of evidence of the included biomarker tests. The results were summarized using GRADEPro software (Version 3.2, 2008).
Results
Description of studies
Our search strategy identified a total of 7260 citations. Removal of duplicates resulted in 5567 distinct citations, of which 90 full-text articles were assessed for eligibility, and 26 articles
3,18
–42
were included in the systematic review (Fig. 1, flow diagram of search and eligibility results, and Table 1). Tables 2 and 3 show the main characteristics of the included publications, and additional details are provided in Tables S1 and S2(see online supplementary material at

Study flow diagram.
Summary of the Number and Characteristics of Primary Articles Identified for Each Biomarker
GCS, Glasgow Coma Scale; S100B, S100 calcium binding protein B; GFAP, glial fibrillary acidic protein; NSE, neuron specific enolase; UCH-L1, ubiquitin C-terminal hydrolase-L1.
Characteristics of the 26 Included Studies
Mean (SD) unless stated otherwise.
ACEP/CDC, American College of Emergency Physicians/ Centers for Disease Control and Prevention; BM, biomarker; ECI, extracranial injury; ED, emergency department; GCS, Glasgow Coma Scale; GFAP, glial fibrillary acidic protein; IQR, interquartile range; LOC, loss of consciousness; MHI, mild head injury; MHT, mild head trauma; mTBI, mild traumatic brain injury; NR, not reported; NSE, neuron specific enolase; PAI, platelet aggregation inhibitor; PTA, post-traumatic amnesia; S100B, S100 calcium binding protein B; UCH-L1, ubiquitin C-terminal hydrolase-L1.
Biomarker Characteristics of the 26 Included Studies
Mean (SD) unless stated otherwise.
Additional thresholds have been evaluated.
Median (IQR) unless stated otherwise.
Control group definition:
• Negative Control Group: healthy individuals (e.g., healthy volunteers, voluntary blood donors, outpatients for routine blood work) who were checked on their health and potential head trauma status.
• Positive Control Group: patients with moderate to severe brain injury.
• Orthopedic Control Group: non–brain-injured patients presenting to the ED with a single-limb orthopedic injury without blunt head trauma.
• MVA Control Group: patients presenting to the ED after a motor vehicle crash without blunt head trauma
BM, biomarker; CV, coefficient of variation; ECLIA, electrochemiluminescence immunoassay; ED, emergency department; ELISA, enzyme-linked immunosorbent assay; GCS, Glasgow Coma Score; GFAP, glial fibrillary acidic protein; H0, within 3 h after the clinical event; H+3, 3 h after the first sampling; IFMA, immunofluorometric assay; ILMA, immunoluminometric assay; IQR, interquartile range; LIA, luminescence immunoassay; LLOD, lower limit of detection; LLOQ, lower limit of quantification; LOD, limit of detection; mTBI, mild traumatic brain injury; MVA, motor vehicle accident; NA, not applicable; NR, not reported; NSE, neuron specific enolase; pts, patients; RIA, radioimmunoassay; S100B, S100 calcium binding protein B; SEM, standard error of the mean; TBI, traumatic brain injury; UCH-L1, ubiquitin C-terminal hydrolase-L1; ULOQ, upper limit of quantification.
Two of the 26 included articles reported biomarker results from the same patient cohort. 34,43 All studies were published in 2000 or later. With the exception of one study published in French, 21 and one published in Italian, 24 all studies were published in English.
The total number of patients with TBI in the included studies was 8127, ranging from 50 28,37 to 156042 per study (median 170, interquartile range 104–258). Of those, 865 had positive CT scans, with an average prevalence of 17% (median 13%) (range 5–51%) (Table 2). Table S2 shows the criteria used for the definition of TBI/mTBI and positive CT scans (reference standard) in the different studies. In nine articles, the presence of a skull fracture was considered as a traumatic CT abnormality.
The reported mean or median age of the included patients ranged from 3238 to 83 years, 39 with 10 studies including children and/or adolescents (patient age <18years). The total subject pool was largely male (median 63% across the studies), with the exception of the study by Thaler and colleagues, which was 68.7% female. 39 Two cohort studies included mild to severe TBI patients (GCS 3–15), 29,38 and two other cohorts included mild to moderate TBI patients (GCS 9–15). 34 –36,40 Six studies enrolled TBI patients with multiple trauma and/or extracranial injuries (Table 2). Nine of the included articles reported biomarker concentration from different types of control cohorts, including healthy individuals, or non–brain-injured trauma patients (See Table 3 for details).
Most of the studies defined the specific time frame from injury to blood draw as an inclusion criterion, with the majority of the samples collected within 6 h of injury (16 studies) and with mean or median time ranging from 24.3 min 33 to 5 h (Table 3). 28 In one study, samples were collected within 12 h, 31 and in two studies, they were collected within 24 h. 29,38
A single marker was evaluated in most of the studies (n = 21), while one study simultaneously assessed three markers. 40 Of the eligible studies, 22 reported data on S100B (total number of TBI patients 7754), 4 reported data on GFAP (total number of TBI patients 783), 3 reported data on NSE (total number of TBI patients 314), and 2 reported data on UCH-L1 (total number of TBI patients 347). Fewer data were available for tau (one study that included only 50 patients), 28 and we found no studies evaluating neurofilament proteins that met our inclusion criteria.
Methodological quality
The assessments of the methodological quality and risk of bias of the included studies are presented in Figure 2 and Figure S1(see online supplementary material at

In half of the studies, thresholds were not prespecified, and ROC analyses were used to determine optimal cutoffs, likely resulting in an overestimation of the diagnostic accuracy of the biomarker evaluated. In addition, the inclusion of skull fracture as a CT abnormality may cause inflation of the accuracy estimates of S100B, whereas, using a brain-specific marker as an index test may result in patients with skull fractures being misclassified as false negative. Finally, in different domains, a substantial number of studies were considered to be at unclear risk of bias because of substandard reporting. We investigated the effect of these factors in sensitivity and subgroup analyses.
S100B
The accuracy of S100B for detecting intracranial lesions on CT scan was evaluated in 22 studies (7754 patients). 3,18 –27,30–33,36 –42 The individual sensitivities and the specificities were between 72% and 100% and between 5% and 77%, respectively (Fig. 3). All but six of the included studies used the same cutoff (0.10–0.11μg/L), which represents the 95th percentile of a healthy reference population and is conventionally considered to distinguish physiological from pathophysiological serum concentrations. 3 Seven studies reported multiple cutoffs (Table 3). The summary ROC curve showing the accuracy of S100B across all the studies, regardless the threshold used, is presented in Figure 4.

Forest plot showing individual sensitivity and specificity of circulating S100 calcium binding protein B (S100B), glial fibrillary acidic protein (GFAP), neuron specific enolase (NSE), and ubiquitin C-terminal hydrolase-L1 (UCH-L1) for detection of intracranial lesions on CT. Horizontal lines represent 95% confidence intervals. TP, true positive; FP, false positive; FN, false negative; TN, true negative.

In terms of the assays/platforms used, most of the studies (13/22) used an automated electrochemiluminescence immunoassay (ECLIA) on an Elecsys® analyzer (Roche Diagnostics), while one used the Cobas 6000 analyzer (Roche Diagnostics). There were four studies conducted using an automated immunoluminometric assay (ILMA) on a Liaison® analyzer (Diasorin), and one was conducted on LIA®-mat (Sangtec® 100); one study used a radioimmunoassay (Sangtec), and one used an enzyme-linked immunosorbent assay (ELISA) platform (Banyan Biomarkers, Inc.) (Table 3). In one study, the analytical performance of the two automated immunoassays (i.e., Diasorin and Roche Diagnostics assays) was compared and, although not interchangeable, the two methods strongly correlated and appeared usable in a similar manner. 27
Performance of S100B at a 0.10–0.11μg/L cutoff value
To obtain clinically relevant estimates of the performance of S100B, we pooled the results from the 16 studies using the cutoff value of 0.10–0.11μg/L. The individual sensitivities and the specificities for each study included in this meta-analysis were between 72% and 100% and between 5% and 77%, respectively (Fig. 5). The following summary estimates were obtained: sensitivity 96% (95% CI 92–98%), specificity 31% (95% CI 27–36%), positive likelihood ratio 1.4 (1.3–1.5) and negative likelihood ratio 0.12 (0.06–0.25). Figure 5 shows the pooled sensitivity and specificity (the solid red spot in the middle) and the 95% confidence and prediction regions (the inner and outer ellipses, respectively).

Summary receiver operating characteristics plot of sensitivity and specificity of S100 calcium binding protein B (S100B) at a 0.10–0.11μg/L cutoff value for detecting intracranial lesions on CT. Each circle represents an individual study; size of the symbol reflects the number of patients in the studies; red solid spot in the middle is summary sensitivity and specificity; inner ellipse represents 95% confidence region, and outer ellipse represents 95% prediction region.
There was a significant level of heterogeneity in the results, greater for specificity than for sensitivity (Fig. 5). The value for sensitivity was >80% in all the studies but one.
41
The value for specificity was mainly >30%; however, in the remaining studies, the low specificity was accompanied by a very high sensitivity. However, because of important variation across studies with simultaneous presence of factors (time, presence of extracranial injuries, mixed populations) (Fig. S2) with potentially contrasting effects on the accuracy estimates and lack of individual data and/or insufficient number of studies, we were unable to compare patient characteristics and investigate the effect of the planned sources of heterogeneity (see online supplementary material at
One study was an outlier (Zongo and colleagues).
42
Exclusion of this study made no change in sensitivity (96.3% vs 96.1%); however, specificity increased from 31% to 33%. This could be explained by the fact that in this study, including the greatest number of patients, S100B levels were measured in plasma, thus increasing the probability of false positive results (Fig. S3) (see online supplementary material at
To explore the effect of risk of bias in the patient selection domain on the summary estimates, we excluded eight studies considered at high (n = 1) or unclear (n = 7) risk of bias. The exclusion of these studies slightly improved sensitivity (98%) (Fig. S4) (see online supplementary material at
The prevalence of CT findings was relatively high (> 11%) in seven studies. Excluding these studies resulted in a slight increase in sensitivity and a slight decrease in specificity (98% and 29%, respectively). Finally, eight studies considered skull fracture as a CT abnormality. To explore the impact of the type of reference standard on the summary estimates, we excluded these studies as well as those in which this information was unclearly reported. The exclusion of these studies slightly impacted sensitivity and specificity (93% and 35%, respectively) (Fig. S4).
Quality of evidence of S100B
The quality of the evidence for the use of blood S100B levels to diagnose brain injury as assessed by CT scan in patients with mild TBI was moderate (Fig. 6).

Summary of evidence for the use of blood S100 calcium binding protein B (S100B) protein concentrations (0.10–0.11μg/L cutoff) to diagnose brain injury as assessed by CT scan in patients with mild traumatic brain injury (mTBI).
GFAP
Eligible studies reporting the accuracy of GFAP for detecting intracranial lesions on CT scan comprised three cohorts with mild to moderate TBI patients and one cohort with mild to severe TBI patients (783 patients) (Figs. 2 and 3). 29,34,36,40 All studies were recent publications (2012–2016).
The individual sensitivities were between 67% and 100%, whereas the specificities were between 0% and 89%. Sensitivities were sufficiently homogenous, whereas specificities were clearly heterogeneous. The thresholds used, ranging from 0 ng/mL 40 to 0.6ng/mL 29 were not pre-specified, and were determined from ROC analyses. The summary ROC curve of the accuracy of GFAP across all four studies, regardless of the threshold used, is shown in Figure 3.
The planned comparison between S100B and GFAP diagnostic performance was not possible, because of the limited number of studies and different spectrum of patients available for GFAP.
NSE
The accuracy of NSE for discriminating between TBI patients with intracranial lesions on CT scanning from those without lesions was evaluated in three studies (314 patients). 33,41 Figure 2 shows a forest plot of the individual study estimates of sensitivity and specificity. The sensitivities were between 56% and 100%, whereas the specificities were between 7% and 77%. The studies reported a considerable variation in the threshold adopted, ranging from 9 to 14.7 μg/L (Table 3).
UCH-L1
The accuracy of the initial circulating UCH-L1 levels for detection of intracranial lesion on CT was evaluated in two very recent studies (96 and 251 patients respectively) 35,40 including both mild to moderate adult TBI patients (GCS 9–15). The two studies yielded the same sensitivity of 100% (95% CI 88–100) and specificities of 21% (95% CI 12–32) and 39% (95% CI 33–46) (Fig. 2). They reported similar thresholds (0.029–0.04 ng/ml) and used the same assay (Table 3).
Tau
The accuracy of circulating tau (cleaved tau [C-tau]) for diagnosis of CT abnormalities was evaluated only in one small study (50 patients). 28 The sensitivity was 50%, whereas the specificity was 75%. Among the 10 patients with abnormal findings on CT enrolled in this study, 5 (50%) had no detectable C-tau levels.
Discussion
In this systematic review, we have provided a comprehensive and thorough examination of the literature on protein biomarker diagnostic signatures for traumatic brain lesions to define how to best take advantage of these tests in ED daily patient care. We found that of the six biomarkers explored, current evidence only supports the measurement of S100B to help informed decision making in patients presenting to the ED with suspected intracranial lesion following mild TBI, possibly reducing resource use. There is as yet insufficient evidence that GFAP, NSE, and UCH-L1 are ready for clinical application, despite their unequivocal association with TBI. Further, tau and neurofilament proteins were analyzed in too few studies to draw any meaningful conclusions. Importantly, serious problems were observed in many of the studies, ranging from unfocused design and inappropriate target groups to biased reporting and inadequate analysis. These points are further elaborated in the subsequent discussions.
S100B
Our findings demonstrate the clinical utility of S100B for the intended use of allowing physicians to be more selective in their use of CT without compromising care of patients with mTBI. More specifically, the 16 studies applying the same prespecified cutoff of 0.10–0.11μg/L yielded a pooled sensitivity of 96% (95% CI 92–98%) and specificities of 31% (95% CI 27–36%). Assuming a pre-test probability of 10% 44 would mean that, overall, 100 of 1000 tested patients will have a final diagnosis of intracranial lesion. The pooled results obtained for sensitivity and specificity would mean that, of these, between 92 and 98 will test positive (true positives) and 2–8 will test negative (false negatives). Of the 900 with negative CT, between 243 and 324 will test negative (true negatives) and between 576 and 657 will test positive (false positives) (Fig. 6).
Even though this high sensitivity and excellent negative predictive value looks promising, information regarding which lesions could be missed and the associated consequences—if left untreated—is particularly relevant to the broad acceptance and adoption of S100B by the medical community. Accordingly, there is an ongoing debate about the risk of sending home a misdiagnosed patient with a potentially life-threatening condition such as an epidural hemorrhage. From the available data, 3,19,30,32,39,42 we were unable to identify specific types of injury that were systematically missed, albeit subdural hematomas were slightly more frequently misclassified as false negatives. We speculate that this may be because of the brain lesion location and/or extension as well as the pathoanotomical and neurovascular features of the different injuries that cause an altered or delayed leakage of S100B into the circulation. Importantly, one study 30 demonstrated that lesions requiring surgery (one subdural hematoma and one epidural hematoma) were missed by S100B, thereby indicating that this marker—if used alone as a diagnostic tool—is not completely reliable. Given that distinct patterns of injury are linked to patient-specific variability, efforts must to be made to develop advanced multiparameter-based solutions integrating marker signature and patient features. Such multimodal prediction models could be more suitable for an accurate diagnosis, characterization of injury types, and risk stratification of mTBI patients. 45
It will be also critical to estimate the independent and complementary value of biomarkers and determine whether this strategy provides added diagnostic utility when combined with a careful clinical assessment or when integrated into existing clinical decision rules for the selective use of CT, such as the CT in Head Injury Patients (CHIP) model, 46 the New Orleans criteria, 4 or the Canadian Head CT rule. 47 Unless a biomarker-based approach yields an incremental diagnostic value and clearly demonstrates its superiority over standard, readily available patient characteristics, the broad acceptance in medical practice is unlikely. 48
Reliability and reproducibility of S100B results also requires a critical consideration of the comparability and potential variability in biomarker measurements when using assays from different manufacturers. We found the adoption of a relatively uniform and standardized approach for S100B determination, with 14 studies using the ECLIA Elecsys® Roche and 2 studies using the ILMA LIA-mat Sangtec 100. These two automated immunometric assays have been demonstrated to have a good correlation, with almost identical diagnostic capability, 27 therefore excluding that this factor could have influenced our conclusions. A comparable level of consistency in analytical methods and assays used is not available for any of the other biomarkers considered in this review.
Our review showed that the results across S100B studies using the prespecified cutoff were consistent in terms of sensitivities and specificities, with only one outlier showing an exceptionally low specificity (12%). 42 A plausible explanation for this anomaly is that in this study, plasma samples were used to measure S100B. This interpretation fits well with evidence from previous literature demonstrating how the interference of the anticoagulant on the immunoreactivity for S100B can alter its levels relative to serum (values higher by ∼20%). 49 Consequently, in the study of Zongo and colleagues, the use of the prespecified cutoff for serum inevitably resulted in a systematic increase of false positive results. 42 This observation, while complicating the analysis of S100B blood levels, points to the need for a more exhaustive knowledge and understanding of pre-analytical factors as potential confounders and sources of variability, and supports the adoption of different cutoff values, depending on the sample type used. Intriguingly, this observation suggests that plasma could be more suitable and possibly desirable for measuring S100B levels in mild TBI patients, because of very low concentrations in this population. However, even after removing the outlier, a considerable heterogeneity remained, necessitating caution when interpreting analysis results.
Investigations from multiple research groups provided evidence that a series of factors other than the brain injury may influence levels of biomarkers in the circulation and, therefore, the diagnostic accuracies. Such factors encompass biomarker characteristics such as molecular weight; injury-specific release mechanisms and clearance (Table S1); 50,51 patient features including presence of extracranial injuries or polytrauma, intoxication, location of the injury, and even genetic, pre-analytical and laboratory-dependent procedures including all steps from management of equipment to execution of assays manufacturing processes; and post-analytical data handling. 19,52 –54 We were not able, however, to systematically investigate these potential sources of heterogeneity, because of a substantial variation across studies, the suboptimal reporting of patient and study information, and the coexistence in the same study of factors with contrasting or controversial effects on the accuracy estimates. Taken together, these findings demonstrate that future research must be refined by improvements in study design as well as standards and characterization of patient selection (See box on page 17) .
In this regard, surprisingly, we noted that to date no attempt has been made to specifically investigate the effect of comorbidities and sex on the diagnostic performance of S100B or any other marker. Sex is recognized as a primary determinant of biological variability, responsible for anatomical, neurochemical, and functional brain connectivity differences, heavily influencing neurobiological and neuropathophysiological response.
55
It is also associated with important differences in hormones, metabolism, and the immunological system, which in turn may interfere with the determination of circulating TBI biomarker.
56
Factoring sex into research designs and analyses is a theme under active debate, and is considered fundamental to rigorous and relevant biomedical research. Hence, we emphasize that this is a critical knowledge gap for future investigation, especially in light of the mounting evidence of the changing gender pattern caused by the shift in the TBI population toward older age, also at risk of multiple comorbid conditions (see Thaler and colleagues).
39
Systematic reviews and meta-analyses of individual participant data (IPD) may represent a powerful approach to overcome some of these gaps and limitations,
57
also supported by the current initiatives to share clinical data and the establishment of common repositories, such as the Federal Interagency Traumatic Brain Injury Research (FITBIR) database (
Clinical application of S100B implies that choosing the right assessment time point (time between injury and sampling) 59 is an integral part of the test. Based on the results of S100B kinetics studies, guidelines have specifically indicated a time window within 3 9,60 to 69 h post-injury for S100B to detect intracranial lesions. A recent study supported a 3 h window for safe rule-out of acute intracranial lesion in clinical practice, showing that a second blood sampling 3 h after the first one is not informative and resulted in a non-trivial loss of sensitivity of ∼6% (e.g., eight patients with positive CT would have been missed). 27 We were unable to further address this specific issue in this review because of the heterogeneity in study design. In addition to post-injury delays in sampling, the delay from obtaining samples to processing and analysis, and the storage conditions during this delay could both be important modulators of S100B stability and assay results. Age, gender, and comorbidities or their combination can also importantly affect the kinetics of S100B. 61 Future studies should inform whether these variables should be considered, and what the potential influence on biomarker results and interpretation is.
The results of our study expand and corroborate those from previous systematic reviews and meta-analyses, 62 –64 and confirm that the implementation of S100B might allow a reduction of the number of CT scans by ∼30%. 3 These considerations also have broad financial implications for healthcare costs. However, none of the studies in our review explored the cost effectiveness of the use of biomarkers, and the few economic studies and data in the literature are controversial. An earlier study by Ruan and colleagues 65 reported a limited effect of S100B on healthcare resources and a potential economic impact only in specific clinical scenarios (i.e., CT scanning rate >78% or a faster turnaround time of biomarker results of at least 96 min compared with CT scan results). Conversely, in a more recent cost analysis conducted in a Swedish regional hospital, the clinical use of S100B incorporated into the Scandinavian guidelines substantially reduced healthcare costs, especially in cases of strict adherence to management recommendations (71€ per patient). 66 These results are not generalizable, and must be carefully interpreted according to their specific contexts, because of the differences across countries, healthcare systems, hospital settings, and ensuing care patterns. To refine cost calculations, future studies should take these factors into consideration, as well as CT overutilization and the socioeconomic costs associated with increased cancer risks from CT scans. Clear demonstration of cost saving and added benefits beyond those obtained by current management strategies for mTBI are essential for TBI biomarkers to be adopted and widely used by the medical community.
GFAP
Recent narrative reviews have outlined the potential of GFAP for identifying patients with intracranial lesions after head trauma, 7 but none of these used systematic review methods or meta-analyses. In the meta-analysis reported here, we included four studies, in which the diagnostic accuracy of GFAP reflected sensitivities of 6729–100% 36,40 and specificities of 040–100%. 29 Although promising, these results must be approached with caution, because the studies included patients with severe and moderate TBI not representative of the target population of the test (the median prevalence of abnormal CT findings across the studies was 22%), and thresholds were not prespecified, factors that may have inflated the accuracy estimates. 67 For diagnostic validation, it will be fundamental to establish reliable and valid thresholds. Also, GFAP needs be tested in larger clinical studies with a focus on the intended use. 68,69 To this end, it has been argued that studies investigating the implementation of biomarker measurements in guidelines for mTBI management—to avoid use of unnecessary CT—should be limited to patients currently recommended for such examination (GCS 14–15), therefore excluding patients with GCS score of 13 for whom biomarker assessment would not add to clinical examination. 9 As mentioned earlier, the definition of these setting-specific characteristics is also critical for performing reliable cost analyses and determining the primary economic advantage of using blood biomarkers as a pre-head CT screening tool.
A meaningful comparison between GFAP and S100B diagnostic performances was precluded by a substantial difference in study populations. In this context, we note that TBI biomarkers discussed in this review are usually considered individually. Further work should more consistently explore simultaneous assessment of multiple biomarkers providing the framework for comparing the accuracy of tests that have directly been compared in individual studies.
NSE and UCH-L1
The relative dearth of studies evaluating the diagnostic accuracy of NSE, UCH-L1, and Tau in the ED for identifying patients with intracranial lesions following mTBI hampered the possibility of performing meta-analyses. The diagnostic value of NSE remains uncertain, with studies showing remarkable variations and inconsistency. In contrast, the accuracy of UCH-L1 for detecting intracranial lesions on CT scan was evaluated in two studies that yielded an optimal sensitivity (100%) but modest specificities (21–39%). Similar to GFAP, the thresholds used were not prespecified, and the studies included patients with mild to moderate TBI (GCS 9–15). Hence, further studies are required to confirm the reproducibility of these findings and to determine clinical utility in daily bedside care.
Tau and neurofilament proteins
There is insufficient evidence to support the clinical validity of initial circulating c-Tau or neurofilament protein concentrations for the management of patients with mTBI.
Implications for research and practice: Strengths and weakness of the review
Our current insight appreciates the complexity of the pathobiology of TBI most probably requiring multifaceted, multimodal approaches, integrating biomarkers and traditional clinical characteristics to allow a more powerful and accurate characterization and risk stratification of mTBI, 45,70 a premise currently insufficiently reflected in the literature. In addition, if the different biomarkers do indeed reflect different pathophysiological processes 51 with independent information about imaging abnormality, outcome impact, and different diagnostic windows, it is possible that the use of a panel of biomarkers may substantially increase the diagnostic specificity for the end-point of interest. 71,72 Unfortunately, to date, only a few such studies are available. More data are needed to evaluate whether a multi-marker approach based on a panel of biomarkers with distinct time-dependent discriminatory accuracy provides a better performance for the detection and characterization of TBI.
Further, we should be cautious in using CT as a gold standard to judge the performance of circulating biomarkers. When compared with MRI, there is increasing recognition that X-ray CT provides poor sensitivity for structural lesions in TBI such as microbleeds and diffuse axonal injury. 73,74 It follows that we cannot assume that false positivity in detection of CT-visible abnormality equates to false positivity in detection of structural injury, because some of these false positives may be associated with abnormalities on MRI or other advanced neuroimaging, persistent post-concussive symptoms, or long-term neurological, cognitive, and/or neuropsychiatric complications. 75 –78 On the other hand, these considerations suggest a broader clinical application of a biomarker-based strategy for diagnosis and management of mTBI. Biomarkers could be used to provide guidance for prognostic groupings, to refine risk stratification, and to inform and guide different management and treatment decisions including indications for advanced MRI techniques (diffusion tensor imaging [DTI], susceptibility weighted imaging [SWI], functional connectivity MRI [fcMRI]), enrollment into clinical trials, and closer monitoring and follow-up of mTBI patients.
From a clinical perspective, biomarkers are not useful if they do not provide real-time decision support for diagnosis of mTBI at the bedside in the ED. A successful approach to the rapid incorporation into routine patient care will be to develop an automated multiplex point of care (POC) device, capable of providing accurate measurements to the clinician at a reasonable cost and with short turnaround times (∼15–20 min). 52,53
The studies discussed in this review focus primarily on adult patients. There is, however, a growing interest in using biomarkers to optimize diagnosis and management of pediatric mTBI, because of the high risk of TBI in children ≤4 years of age, the difficult functional assessments, and the radiation exposure at a young age with ensuing increased cancer risk. 75,79,80 Future studies and systematic reviews taking current and new evidence into account are urgently needed to elucidate the role of biomarkers and establish their clinical utility in this special and vulnerable population.
Several potential limitations merit consideration. Patient selection is a critical aspect in reviews of test accuracy, as it can alter the spectrum of disease and non-disease and the prevalence in the population, strongly impacting test accuracy. 67 Given the heterogeneous and polymorphous nature of TBI, in particular at the milder end of the spectrum, there has been an inconsistent, sometime controversial, definition of mTBI adopted in the included studies. For example, focal neurological deficit has been considered either as an inclusion or as an exclusion criterion (Table S2). This diagnostic uncertainty may possibly have introduced different biases. Although this is an issue that we cannot solve in this review as we had to rely on the criteria that were listed in the included studies; nonetheless, we were able to assess the robustness of the findings using sensitivity analysis, which even demonstrated an improvement in S100B performance (Fig. S4).
However, with respect to selection of patients and study design, our group endorses the importance of methodological rigor, and advocates the use of standardized protocols and a prespecified set of data analysis both as a means to reduce related biases and inadequate reporting, and as a mandatory prerequisite to ensure successful validation and implementation of TBI diagnostic biomarkers. Also critical consideration for sample size planning based on assay precision, clinical significance, and regulatory considerations is necessary. Involvement of regulatory bodies in driving forward harmonization and standardization is considered essential. A major step forward in this direction is the recently established collaboration between researchers and the United States Food and Drug Administration (FDA) in the context of the TBI Endpoints Development (
Further, despite the broad adoption by the scientific community of the STARD statement (Standards for Reporting of Diagnostic Accuracy studies), 81 we found a number of studies with poor or inconsistent reporting of important information, including patient and specimen characteristics, assay methods, handling of missing data, and statistical analysis methods, in addition to suboptimal descriptions of study findings, which hampered our assessment of potential for bias and interpretation of the results. Our observations are important in raising awareness of key reporting issues in many of the TBI diagnostic studies. The STARDdem Initiative recently proposed an implementation of the STARD statement with guidance pertinent to studies of cognitive disorders, which is expected to contribute to the development of Alzheimer biomarkers. 82 A similar initiative for TBI biomarker studies could increase transparency and the quality of information provided by such studies, enabling evaluation of internal and external validity and, consequently, a more effective translation and application of their findings to clinical practice.
Harmonization and standardization of biomarker assays that can reliably quantify biomarkers with high analytical precision is critical to ensure that measurements are reproducible and consistent across different analytical platforms and multiple laboratories.
Conclusion
Based on this review, we found that measurement of S100B can help informed decision making in the ED with respect to the selection of adults with a mTBI for CT scan, possibly safely reducing resource use. Conversely, there is little evidence for clinical application of GFAP, UCH-L1, NSE, tau or neurofilaments. However, much work remains to evaluate factors that may influence biomarker levels, and a critical confrontation is required with the implications for actual management, clinical impact, and health economic implications. We also found serious problems in the design, reporting, and analysis of many of the studies, emphasizing the importance for the research community to establish methodological standards and acquire extensive high-quality data for TBI biomarker validation. This is an essential prerequisite for drawing firm conclusions about the performance of tests based on these biomarkers and their clinical utility.
Finally, through the extensive and critical review of the current TBI biomarker existing literature, and state-of-the-science discussions with key opinion leaders and subject matter experts, members of our work group collaborated to evaluate the evidence necessary to demonstrate clinical utility of TBI biomarkers, to identify critical gaps for advancing the field, and to lay the foundation for a “living” TBI biomarker registry capable of providing an up-to-date list and information on biomarker studies and their results (see Box). Such a strategy, helping to foster collaboration, developing the high levels of evidence needed to support analytical validity and clinical utility, and improving the quality of assessments of novel candidate biomarkers, should establish the solid ground needed for changing biomarker research from data that informs into data that transforms, turning knowledge into a new medical practice.
Footnotes
Acknowledgments
This work was supported by the European Union FP 7th Framework program (CENTER-TBI; Grant number: grant 602150) and the Hungarian Brain Research Program (Grant No. KTIA 13 NAP-A-II/8). We thank Dr. Ornella Clavisi for assistance with the search strategy.
Author Disclosure Statement
Dr. Wang is a former employee of Banyan Biomarkers Inc. and owns stock. Dr. Wang also receives royalties from licensing fees, and as such, Dr. Wang may benefit financially as a result of the outcomes of this research or work reported in this publication. There are no other disclosures to report.
Appendix: Search Strategy
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
