Abstract
Clinical trials of therapies for acute traumatic spinal cord injury (tSCI) have failed to convincingly demonstrate efficacy in improving neurologic function. Failing to acknowledge the heterogeneity of these injuries and under-appreciating the impact of the most important baseline prognostic variables likely contributes to this translational failure. Our hypothesis was that neurological level and severity of initial injury (measured by the American Spinal Injury Association Impairment Scale [AIS]) act jointly and are the major determinants of motor recovery. Our objective was to quantify the influence of these variables when considered together on early motor score recovery following acute tSCI.
Eight hundred thirty-six participants from the Rick Hansen Spinal Cord Injury Registry were analyzed for motor score improvement from baseline to follow-up.
In AIS A, B, and C patients, cervical and thoracic injuries displayed significantly different motor score recovery. AIS A patients with thoracic (T2-T10) and thoracolumbar (T11-L2) injuries had significantly different motor improvement. High (C1-C4) and low (C5-T1) cervical injuries demonstrated differences in upper extremity motor recovery in AIS B, C, and D. A hypothetical clinical trial example demonstrated the benefits of stratifying on neurological level and severity of injury.
Clinically meaningful motor score recovery is predictably related to the neurological level of injury and the severity of the baseline neurological impairment. Stratifying clinical trial cohorts using a joint distribution of these two variables will enhance a study's chance of identifying a true treatment effect and minimize the risk of misattributed treatment effects. Clinical studies should stratify participants based on these factors and record the number of participants and their mean baseline motor scores for each category of this joint distribution as part of the reporting of participant characteristics. Improved clinical trial design is a high priority as new therapies and interventions for tSCI emerge.
Introduction
R
Despite numerous clinical trials over the past 20 to 30 years and the acknowledged lack of progress in identifying clinically effective treatments for tSCI, 5,6 one has to question whether successful therapies have eluded us simply due to the ineffectiveness of these potential treatments, or whether it is possible that there is a flaw in the methodological approach taken by these trials.
Researchers who translate neuroprotective therapies in traumatic brain injury and stroke have experienced similar challenges in establishing efficacy in clinical trials and have attributed this to multiple etiologies, including a lack of acknowledgement of the pathoanatomic and pathophysiologic heterogeneity of the injury processes and a poor understanding of the most influential baseline prognostic indicators. 7,8
Controlling for bias requires the identification and quantification of variables that strongly influence outcome. Adjustment for imbalances in these strong predictive variables through stratification at the time of recruitment would increase the power of a study, something of substantial relevance in the neurotrauma literature. 4,9 This will increase the power of studies, improve the chances of detecting a true therapeutic effect and most importantly for the high profile and vulnerable tSCI population, increase the likelihood that a statistically significant effect represents a true effect.
Randomized controlled trials (RCTs) in acute tSCI research are hampered by small sample sizes due to immense recruitment issues 10 ; therefore, large observational cohort studies are likely to play a major role in future tSCI research. It is equally important to ensure balanced representation of confounding variables in observational study cohorts as in RCT's.
SCI has the benefit of a reproducible measure of neurological recovery: voluntary motor function. This is the most important recovery target for patients 11 and a measure of impairment that is strongly correlates with functional outcome. 12 Motor recovery can be reliably measured using the International Standards for Neurologic Classification of Spinal Cord Injury (ISNCSCI), a widely adopted grading scale accepted by the major international spinal cord societies. 13 Using ISNCSCI motor scores as an outcome is complicated by the fact that injuries at different spinal levels will have differences in the potential measurable deficits and thus different potential for recovery. The ISNCSCI only measures between C5-T1 and L2-S1 neurological levels, so recovery of motor function is not equally and comprehensively detectable by the ISNCSCI motor examinations. Further, there is an increasingly prominent ceiling effect on neurological recovery as the level of injury moves more caudally. Finally, it is possible that the observed recovery of motor function is not comprehensively detected by ISNCSCI as recovery occurs at varying rates and degrees within the various anatomic regions of the spinal cord (upper vs. lower cervical and thoracic vs. thoracolumbar).
Despite these obvious limitations—the fact that motor score does not fully encompass global spinal cord function and the statistical idiosyncrasies of motor scores as described by numerous authors 14 –19 —we chose to analyze early motor recovery to identify and characterize those covariates that will most profoundly influence prognostic models and thus impact the design, conduct, and analysis of potential future clinical trials.
Although motor recovery following tSCI may be influenced by many variables, 20 we selected the two dominant predictive variables: the neurological level of injury and the baseline severity of the neurological impairment. 19,21 To represent the neurological severity of the initial injury, we use the American Spinal Injury Association Impairment Scale (AIS) as a surrogate. 19 Clinical trials have used AIS as a primary outcome measure. 22,23 We categorized the neurological level of injury into four groups: high cervical (C1-C4); low cervical (C5-T1); thoracic (T2-T10); and thoracolumbar (T11-L2), basing these on the ISNCSCI last normal neurological level. There is no consensus in current or prior clinical trials on how to quantify and characterize the influence of neurological level and severity of injury, 14,17,24 and we sought to investigate their influence on motor score recovery.
The purpose of this paper was to quantify the prognostic influence of neurological level and severity of injury on the magnitude of early motor recovery in a large observational cohort of acute tSCI patients to guide clinical scientists in recognizing the key prognostic variables for which tSCI clinical trials should stratify their participants. We proposed a standardized stratification and reporting template for clinical trials and provided a hypothetical example to demonstrate the impact of these prognostic variables on the design and analysis of tSCI clinical trials.
Methods
This Canadian multicenter observational study utilized data prospectively collected for the Rick Hansen Spinal Cord Injury Registry (RHSCIR). This ongoing cohort study recruited patients with acute tSCI presenting to one of 31 RHSCIR sites located in 16 cities representing nine of 10 Canadian provinces. Patients presenting to any RHSCIR site within one month from an acute tSCI are eligible for consent and enrollment. Non-traumatic etiologies of SCI (infection, neoplasm, iatrogenic, acute vascular) were excluded. The specific details of the data collection and registry structure is reviewed elsewhere. 25
The subjects were recruited to the RHSCIR from 2004 to December 2012. The final follow-up neurological examination was performed prior to March 31, 2013. We included patients who had a baseline ISNCSCI neurological assessment performed up to a maximum of 30 d from injury, and a follow-up assessment at least 30 d after their baseline assessment. These broad inclusion criteria were selected for several reasons. Unlike a clinical trial where there are set times for assessments, registry data reflects clinical practice and the baseline neurological assessment is performed within a wider range, usually at the time of presentation to a specialized SCI center. Further, some clinical trials in SCI recruit in the very early acute phase while others recruit sub-acutely; thus, we included baseline examinations up to 30 d post-injury to enhance the generalizability of our results.
The change in upper extremity motor score (UEMS; maximum, 50) and change in total motor score (TMS; maximum, 100) were defined as the difference between the most recent follow-up and baseline respective motor scores. 26 Physicians, nurse practitioners, and physiotherapists who had been extensively trained in the ISNCSCI neurological examination performed detailed baseline and follow-up neurological assessments. The date and time of injury and each neurological assessment were recorded.
The neurological level of injury was defined as the most caudal segment with normal neurological function according to the ISNCSCI. The neurological level of injury was subdivided into high cervical (C1-C4), low cervical (C5-T1), thoracic (T2-T10), and thoracolumbar (T11-L2) based upon the neuroanatomical and physiological variability of these regions and the literature. 19,21 The division of cervical into those with measurable motor efferent roots (C5-T1) and those without measured motor segments (C1-C4) is intuitive. Unlike the thoracic levels for which no measurable motor correlates exist, many segments through the thoracolumbar region (T11-L2) contain within the cord motor cell bodies whose function can be reliably measured using the ISNCSCI. Patients with a neurological level below L2 were excluded since these were likely to be predominantly cauda equina injuries.
The severity of neurological impairment was defined according to the AIS with grades from A (complete injury, no sensory or motor function is preserved in the sacral segments S4-5) to D (motor function is preserved below the neurological level). The AIS has been used as a primary outcome measure 22,23 and has been proposed as a surrogate measure of injury severity. 19
Statistical analysis
Parametric tests were used if variables had a normal distribution and non-parametric tests were used if either the data were not normally distributed or if there was a small sample size. Chi-square or Fisher's exact test were used for categorical data depending upon the number of the sample in each cell.
To evaluate the effect of the injury severity and the level of injury on changes in motor score, subgroup analyses were performed using a general linear model. A one-way between-subjects analysis of variance (ANOVA) was conducted when there were three or more groups. Post hoc comparisons using the Duncan and Scheffe's tests were used to detect any differences between the level/severity-based subgroups. Comparisons between two groups were analyzed using independent t-tests with Levene's test to assess the equality of variance. Mann-Whitney U and Kruskal-Wallis tests were used for comparing two groups or more than two groups, respectively, if the data were not normally distributed. We acknowledge the work of Geisler and his colleagues 19,27 that described how motor and sensory scores do not follow the assumptions underlying normal-theory and that by pooling data from different strata may result in different probability distributions and give misleading results. We performed our statistical analyses specific to the distribution of data in each individual subgroup, thus minimizing heterogeneity and analyzing more statistically homogenous and clinically relevant subgroups. A p value of ≤0.05 was considered statistically significant. Data analysis was conducted using SPSS (version 20) and SAS (version 9.3).
A hypothetical SCI clinical trial scenario was created by randomly assigning participants from our RHSCIR study cohort to two equal groups (Group 1 and Group 2) using a computerized random number generator. The primary outcome in this hypothetical trial was an improvement in total motor score. The scenario was created not as a demonstration of the fundamental statistics and epidemiology of type I, or alpha error, but to demonstrate, using real world data, the methodological issue that we believe is plaguing many small SCI clinical trials. We used this example to highlight how acknowledging the heterogeneity in our proposed categories can explain imbalances that may lead to statistical error and how the reader, reviewer, and investigator can address this inadvertent selection bias through either a priori stratification or subsequent balancing of the heterogeneous participant population.
The hypothetical example also is presented to provide a real-world example of how conventional reporting of study participants is inadequate at exposing the imbalances in study populations. We demonstrate how the acknowledgment of our classification categories aids in appropriately powering trials.
Results
From a total population of 980 patients who met our inclusion criteria, 144 (14.7%) patients had one or more motor segments with missing individual motor scores, leaving a final cohort of 836 (85.3%) patients for our analysis. Table 1 describes the characteristics of the patient population and their neurological examinations. Institutional and university research ethics board approval was obtained at each of the centers.
SD, standard deviation; AIS, American Spinal Injury Association Impairment Scale.
The mean change in TMS and UEMS of all individuals in the cohort was sub-classified based upon the four neurological levels of injury and the initial AIS grade (A through D). This forms the basis of what we call the “Canadian Classification” and is seen in Table 2.
AIS, American Spinal Injury Association Impairment Scale; SD, standard deviation.
There was evidence of non-Gaussian distribution in motor score change; therefore, we used the Kruskal Wallis test to compare the four AIS groups. There were significant differences among the four groups (p<0.001). Post hoc testing (Mann-Whitney U with Bonferroni correction) confirmed that the four groups should be considered separately. In each of the four AIS groups we also tested the normality of the data. With this particular data, AIS A and B are not normally distributed; therefore, the Kruskal Wallis test was used for comparison of the four anatomical regions within each AIS group. Thus, in AIS A and B participants, we used the Mann-Whitney U as a post hoc test with Bonferroni correction to adjust for the p value.
One-way ANOVA was used for comparing the four anatomical regions in AIS C and D participants, as the data distribution was normal. Two post hoc tests were used after getting a significant result in AIS C (Duncan and Scheffe).
A significant difference in mean total motor score recovery between the neurological levels of injury was identified for AIS A, B, and C (p≤0.05; Table 2). All AIS D injuries demonstrated similar TMS improvements with no significant differences among levels of injury (p=0.20), although there was a larger motor score improvement in high cervical AIS D injuries.
A post hoc analysis of TMS recovery by neurological level and AIS demonstrated that when comparing high and low cervical levels, changes in TMS were not significantly different for AIS A, B, and C; however, significant differences in change in UEMS between high and low cervical levels were seen in AIS B (14.3 vs. 9.5), C (20.5 vs. 12.7) and D (11.4 vs. 7.5).
In AIS A thoracic (T2-T10) and thoracolumbar injuries (T11-L2), there was a significant difference in TMS improvement. No significant differences were observed between thoracic and thoracolumbar levels for the incomplete AIS categories (B, C, and D).
Hypothetical clinical trials and sample size
A computerized random number generator selected 64 participants, randomly assigning them to two groups in an attempt to mimic recruitment into a hypothetical trial similar in size to previously published trials. 18,28 –30 We compared these two groups for what are commonly reported characteristics of study participants—age, gender, AIS on admission, neurological level of injury, and mechanism of injury as outlined in Table 3—and found no significant differences. Groups 1 and 2 demonstrated a statistically significant (p=0.05) difference in motor score improvement. This is not a true effect but represents a type I error, which would be expected to occur one time out of twenty samplings despite randomized cohort selection. When we repeated this sampling process 20 times, two of the twenty occasions demonstrated “significant” differences in motor score improvement.
SCI, spinal cord injury; AIS, American Spinal Injury Association Impairment Scale; SD, standard deviation.
When the same hypothetical groups of patients are presented in the form of the Canadian Classification (Table 4) a significant imbalance is visible, particularly in relation to those neurological level and injury severity subgroups where there is the greatest potential for neurological improvement; specifically AIS B and C high and low cervical categories.
Table 4 displays the numbers of patients with their anticipated mean motor score recovery for each AIS and anatomic region category in brackets for each category of the Canadian Classification for the two hypothetical study cohorts (Group 1 and Group 2). Anticipated motor recovery for each category in our 4×4 classification is shown in brackets and is obtained from Table 2. Imbalances that are not apparent in the far right column or bottom rows can be identified in the joint distribution, particularly in AIS B high cervical where there are four more patients in Group 1, compared with Group 2, each with a potential to improve by 29 motor points.
Group 1 has an additional two patients in each AIS C cervical group, each of which have 35–41 potential motor score improvements, thus explaining why the follow-up motor score in Group 1 is more than 10 points higher than that in Group 2. This demonstrates that small numerical imbalances within this joint 4×4 distribution (particularly in the subgroups with high potential for neurological motor improvement) can have a dramatic influence on neurological recovery in Group 1.
Further, we assessed the power of our hypothetical study in Figure 1 by performing sample size estimations. We calculated the sample size and power necessary to demonstrate a TMS improvement of 10. We estimated the standard deviation (SD) in the study groups based on our study population (SD=15) and the literature (difference in SD=10–35). 16 The anticipated sample size to achieve a power of 0.8 would be 180 participants with random sampling of our cohort. When we repeated this exercise while stratifying for conventional characteristics (age, gender, AIS), we required 102 total participants. Finally, when we stratified subject selection based on the Canadian Classification, which considers the joint distribution of both neurological level and severity of injury together, 78 participants are required.

The sample size and power was calculated for a hypothetical study based on a 10-unit change in total motor score for three different sampling scenarios. For each graph, the associated standard deviations (SDs) for a ten unit change in motor score are reported based on data from the Rick Hansen Spinal Cord Injury Registry and other published papers. Curve 1 uses simple random sampling, Curve 2 stratifies for conventional patient characteristics (age, gender, American Spinal Injury Association Impairment Scale), and Curve 3 is based on stratifying patients based on the Canadian Classification.
Discussion
We hypothesized that the magnitude of early motor recovery following acute tSCI can be more accurately estimated when both the neurological level and severity of the initial injury are used in a joint distribution to categorize these injuries. We selected four distinct neurological levels of injury—high cervical, low cervical, thoracic, and thoracolumbar—and used the patient's initial AIS grade as a measure of neurological severity of injury.
We analyzed the early motor recovery of 836 patients and verified that there are differences in early motor recovery between the four levels of injury that are unique to their initial AIS grades. Although it has been recognized that level and severity of injury are the two predominant predictors of neurological outcome, 19,21 the strong influence of their joint distribution, our proposed categorization of neurological levels, and the impact this could have on RCT design and analysis of observational datasets have not been previously reported.
The significant impact of our study is that we identify the importance of the joint distribution of injury level and severity beyond what is commonly a simple univariate reporting of these characteristics in the conventional tables included under the category of study participant characteristics in most clinical trials. Unrecognized imbalances between categories within the Canadian Classification have plagued SCI translational efforts for many years.
We propose that all SCI trials should report on the level and severity of injury in our proposed joint distribution in their description of participant characteristics (Table 2).
While motor score improvements in some categories were very similar, patients with the same initial AIS grade, but a different neurological level had total motor score change differences that varied from a low of 5 motor points in AIS D to 20 motor points for AIS B and C (Table 2). This magnitude of difference in motor score improvement would be deemed to be clinically important. 16,26 When the patient's initial AIS grade is A, B, or C, we have shown that mean early motor score improvements are significantly different for cervical and thoracic injuries.
We demonstrated that the high and low cervical levels of injury display different magnitudes of UEMS recovery in individuals with incomplete tSCI (AIS B, C, and D). Distinguishing between high and low cervical levels of injury reached significance for UEMS in AIS C and D, while a similar trend was seen in AIS B (p=0.05). High cervical injuries benefit from a larger number of segments available to recover motor function than would a low cervical injury of similar severity and this is particularly relevant for individuals with incomplete neurological injuries. The distinction between high and low cervical incomplete patients is not often specified in clinical trial reporting. 22,23,29,31 Upper extremity motor recovery is a clinically and functionally important outcome 32 and thus the significant differences in recovery potential between high and low cervical neurological levels should be distinguished and reported separately for each AIS, particularly when incomplete patients are included in a clinical trial or observational study.
We identified a significant difference in motor recovery between thoracic (T2-T10) and thoracolumbar (T11-L2) injuries when the initial AIS grade is A. This is not a surprising finding given the unique neuroanatomy of the region of the conus medullaris and cauda equina, and the way in which neurological recovery has been shown to be dependent upon this unique anatomy. 33
Our analysis defines the level of injury based on a validated and universally accepted standard, the ISCNSCI neurological level. We have subdivided the neurological level of injury into four groups that are intuitive based upon our understanding of neuroanatomy and physiology. We have used the validated and accepted AIS as a surrogate for severity of neurological injury.
We acknowledge that this study includes a wide range of baseline neurological examination times and does not report on long-term or final neurological outcome. There has been substantial debate on how early a baseline neurological examination can be reliably performed in neuroprotective trials, and this will be the subject of further investigation by our group and others. Including the broader timing of baseline neurological examinations generalizes our results to acute neuroprotective and subacute neuroregenerative trials, although our intent was not to mimic a clinical trial, but to analyze registry data, which more closely represents actual clinical practice. It is estimated that over 70% of recovery occurs while the tSCI patient is still hospitalized, and the majority of rapid recovery occurs within the first three months following injury. 16,34 Six-month neurological recovery has been used as a primary outcome in clinical trials and the majority of our patients had their follow-up neurological examination around the six month mark.
We also acknowledge that the time from injury to when the baseline neurological examination is performed can influence the magnitude and rapidity of motor recovery and this is the subject of a future study. This current analysis includes patients whose baseline examination is performed within 30 d of injury; however, we would anticipate that those with early neurological examinations (<24 h from injury) would demonstrate an even greater rate and magnitude of motor recovery than our current population. By including a broader range of baseline neurological examination times we have likely underestimated the effect of our classification on patients that are recruited into early intervention trials and have increased the generalizability of our results to acute and sub-acute intervention trials.
We believe that the national nature of our participant sample adds to the generalizability of our results, while the training of clinicians in the performance of the ISNCSCI standards adds to the reliability and accuracy of our results. 35 We also acknowledge that several of the categories in our classification are based on relatively small sample sizes. This is a consistent feature of all trials in SCI; despite national sampling, the numbers of some categories of tSCI, particularly incomplete injuries, are remarkably small. This reality is one that must be acknowledged by those who plan and design clinical trials.
We have provided an example of a hypothetical small clinical trial, similar in size to several reported clinical trials. 18,28 –30 Our hypothetical trial identified a significant difference in motor recovery between two groups by chance alone, a finding that is not at all surprising since we set our level of significance at 5% and thus we would expect one false positive result in every 20 tests. The fundamental reason for presenting this hypothetical trial is to demonstrate, using an example of “real-world data” drawn from our population, that by presenting the study data in the joint distribution of our proposed 4×4 classification of SCI, a reader, reviewer, or investigator is able to identify and quantify the imbalances between the two randomly selected cohorts. Essentially, we have identified subgroups of tSCI patients in which there are predictable and substantial differences in motor recovery. Between some subgroups, the differences are less profound. Our hypothetical example demonstrates the importance of the joint distribution of level and severity, and how imbalances between the cells of our proposed classification may not be readily apparent to clinical trialists when reporting study participants in the conventional fashion.
It is highly plausible that motor recovery occurs differently at the various levels of the spinal cord, and that the rate and magnitude of recovery also varies with the severity of initial injury. In addition to this variability, the tool that we use to measure this recovery (the ISNCSCI motor scale) records motor improvements differently depending upon the level and severity of injury. Given that the ISNCSCI only records ten motor segments each in the upper and lower extremities, a similar numerical magnitude of motor improvement may be distributed in very different ways in the same patient and between different patients, leading to a lack of predictability of what is clinically meaningful when a numerical change in motor score is reported.
Our hypothetical trial also suggests that the best strategy to avoid these imbalances is to stratify patients based on neurological level of injury and severity of neurological impairment. This will reduce the variability in the motor recovery outcome and also reduce the numbers needed for adequate power. Stratification may adversely influence the feasibility of recruitment, as well as the generalizability of a study. If clinical trials use this classification to report their study composition, then comparing datasets across clinical trials and observational studies would be facilitated.
It is likely that beneficial therapies may demonstrate an effect in only one or several of the anatomic and injury severity categories. Some therapies might demonstrate their effect in AIS A patients, while others may have a mechanism of action that is more effective for incomplete injuries. Further, with the varying contribution of grey and white matter to different neurological levels of spinal cord injury, a therapy directed at re-myelination would likely express a different clinical response than a cell-based or neuroprotective strategy.
It is only by acknowledging the heterogeneity of tSCI that clinicians will be able to personalize their approach to the treatment of tSCI. This study is simply a first step in understanding the heterogeneity inherent in tSCI. We anticipate that further observations, collaborations, validations, and replication will be required to refine these categories and how they are to be used to improve the methodology of tSCI observational and randomized clinical trials.
Footnotes
Acknowledgments
We would like to specifically acknowledge the two reviewers who provided substantial input that has improved the quality of this manuscript. We would also like to acknowledge the Rick Hansen Spinal Cord Injury Registry Network, Karen Ethans, and Colleen O'Connell.
Author Disclosure Statement
Funding was provided by the Rick Hansen Institute and Health Canada. No competing financial interests exist.
