Abstract
Background. The reliable stratification of homogeneous subgroups and the prediction of future clinical outcomes within heterogeneous neurological disorders is a particularly challenging task. Nonetheless, it is essential for the implementation of targeted care and effective therapeutic interventions. Objective. This study was designed to assess the value of a recently developed regression tool from the family of unbiased recursive partitioning methods in comparison to established statistical approaches (eg, linear and logistic regression) for predicting clinical endpoints and for prospective patients’ stratification for clinical trials. Methods. A retrospective, longitudinal analysis of prospectively collected neurological data from the European Multicenter study about Spinal Cord Injury (EMSCI) network was undertaken on C4-C6 cervical sensorimotor complete subjects. Predictors were based on a broad set of early (<2 weeks) clinical assessments. Endpoints were based on later clinical examinations of upper extremity motor scores and recovery of motor levels, at 6 and 12 months, respectively. Prediction accuracy for each statistical analysis was quantified by resampling techniques. Results. For all settings, overlapping confidence intervals indicated similar prediction accuracy of unbiased recursive partitioning to established statistical approaches. In addition, unbiased recursive partitioning provided a direct way of identification of more homogeneous subgroups. The partitioning is carried out in a data-driven manner, independently from a priori decisions or predefined thresholds. Conclusion. Unbiased recursive partitioning techniques may improve prediction of future clinical endpoints and the planning of future SCI clinical trials by providing easily implementable, data-driven rationales for early patient stratification based on simple decision rules and clinical read-outs.
Keywords
Introduction
Traumatic spinal cord injury (SCI) is a heterogeneous disorder in terms of pathology, neurological deficits, and subsequent spontaneous recovery.1,2 Furthermore, seemingly comparable cord injuries (as classified by the American Spinal Injury Association [ASIA] Impairment Scale [AIS A-E]), can achieve a diverse range of neurological and functional recovery, especially after incomplete SCI. 3 This is similar to other central nervous system disorders and makes reliable prediction of future outcomes challenging.
Despite this, the reliable prediction of future clinical endpoints is important to the implementation of targeted care and effective treatment options. In addition, reliably defining relatively homogeneous subgroups for clinical trials is important to accurately determining whether a therapeutic intervention provides a distinct benefit.4-6 In many situations, such as a study of the biological or functional activity of an experimental therapeutic in early phase clinical trials, it is desirable to only enroll subjects who are relatively homogeneous in terms of both their early neurological status, as well as their prognosis for achieving a defined clinical endpoint (eg, future outcome). If trial participants have heterogeneous neurological and functional characteristics when assigned to a study arm, the contribution of a small number of participants may distort the overall results, the outcome interpretation, and disregard subtle treatment effects, thereby wasting subject and study resources.
Recently, in SCI, attempts have been made to create clinical algorithms for the prediction of long-term endpoints and for patient stratification.7-9 These algorithms relied on the statistical techniques of multiple linear and logistic regressions. In this study, we compare these established statistical regression approaches7-9 with a recently developed unbiased recursive partitioning regression tool called Conditional Inference Tree (URP-CTREE), 10 which directly identifies more homogeneous subgroups from an initial heterogeneous population. The aim was to compare the predictive accuracy of URP-CTREE against established regression models to predict future clinical endpoints from early neurological assessments, and to investigate the contribution of these methods to the stratification of cervical sensorimotor complete (AIS A) subjects into homogeneous subgroups.
Methods
Data Source
Data were obtained from the European Multicenter study on Spinal Cord Injury (EMSCI; http://www.emsci.org); an ongoing European network of SCI centers prospectively gathering data from subjects over the first year after traumatic SCI. The standardized assessment protocol tracks the neurological and functional status of patients during recovery from SCI. The EMSCI database was established in 2001 and has collected data from more than 2500 subjects during the past 12 years from 21 centers in 7 European countries.
Inclusion Criteria
The target population for this study included those EMSCI subjects who had cervical sensorimotor complete (AIS A) SCI with a Motor Level (from the right body side) at either C4, C5, or C6 as determined by a baseline assessment using the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) 11 within the first 2 weeks after SCI (Figure 1). Only subjects with a documented assessment and determination of the selected clinical outcome of interest at that endpoint were included.

Subject numbers and selection criteria as extracted from the EMSCI database.
Two separate analyses are reported. The first analysis evaluated the total bilateral Upper Extremity Motor Score (UEMS) at 6 months after cervical complete SCI and is referred to hereafter as Total-UEMS. ISNCSCI motor score is determined by assigning to one muscle group, innervated and primarily identified with a specific spinal level, an integer between 0 (no detectable contraction) and 5 (active movement and a full range of movement against maximum resistance). Between C5 and T1 there are 5 representative “key” arm and hand muscles tested on each side of the body for a total upper extremity motor score of 25 + 25 = 50. The second analysis examined whether the subject achieved a 2-motor level improvement within the cervical cord (on either the left or right side) by 12 months after cervical complete SCI and is referred to as 2-motor level change. Subjects with a cervical complete SCI between C1 and C3 or C7 and T1 were excluded from analyses, as it is challenging to track the recovery of upper extremity motor scores or there were an insufficient number of subjects for statistical analysis.
Predictors and Clinical Endpoints
Potential clinical predictors (early ISNCSCI scores) and clinical endpoints (Total-UEMS and 2-motor level change at 6 and 12 months after SCI) were selected based on published literature 12-13 and clinical research experience of the authors. The set of predictors characterize the neurological status of the subjects according to the criteria of the ISNCSCI examination, 11 which is routinely assessed at all EMSCI specialized SCI care facilities within the first 2 weeks after injury (mean ± SD = 8.1 ± 4.7 days after injury). All predictors were collected according to EMSCI and ISNCSCI guidelines. Included predictors were age, the motor level (right body side), the bilateral sensory scores (light touch, pin prick), and motor scores (upper and lower extremity motor score), as well as information on the left and right side for motor and sensory zone of partial preservation (ZPP) below the respective motor or sensory level. The zone of partial preservation refers to those segments caudal to the motor or sensory levels where there is some preservation of impaired motor or sensory function.
Here, we present 2 analyses based on complementary clinical endpoints. These endpoints have been related to determining changes in both neurological impairment and/or functional recovery (eg, independence in activities of daily living), as well as being suggested as possible clinical outcome measures for acute and/or subacute clinical studies involving cervical sensorimotor complete (AIS A) subjects.14-16 Ancillary analysis for the bilateral total-UEMS at 12 months and the 2-motor level improvement within the cervical cord (on either the left or right side) by 6 months after cervical complete SCI were also performed.
Unbiased Recursive Partitioning: Conditional Inference Trees
The unbiased recursive partitioning technique called conditional inference tree (URP-CTREE) is a tree-structured regression model based on sequential tests of independence between predictors (eg, early clinical characteristics) and a specified clinical endpoint (ie, future outcome). 10 URP-CTREE divides an initial heterogeneous population into successively disjoint and more homogeneous pairs of subgroups with regard to the clinical endpoint of interest, and thus creates an algorithm for predicting future outcomes within more homogeneous subgroups.
URP-CTREE is based on 2 fundamental steps, which are repeated iteratively for each successive split of the initial heterogeneous population:
Step 1: Association of early predictors (subject’s characteristics) with the clinical endpoint (outcome). The algorithm assesses whether any early predictor is statistically associated with the selected clinical endpoint. This is performed by individually calculating the statistical association of each possible predictor–endpoint pair (no data are presumed to be normally distributed). To each association, a multiple-testing corrected P value is assigned (ie, Bonferroni correction). If the initial null hypothesis of total independence between predictors and outcome cannot be rejected (no statistically significant association between any early predictor and the endpoint), the algorithm stops without producing any split of the initial population. On the contrary, if the null hypothesis of independence can be rejected, meaning that at least one early predictor is significantly associated with the subsequent clinical endpoint, then the algorithm selects the predictor with the strongest statistical association (smallest P value) and passes it to step 2.
Step 2: Splitting procedure for defining more homogeneous pairs of subgroups. Once the most significant predictor has been selected (as expressed in step 1), the algorithm evaluates all possible dichotomous splits on this variable, each one inevitably producing 2 subgroups. The goodness of each split is evaluated by a two-sample linear statistic (eg, χ2 statistic for a binary outcome), to maximize the discrepancy between the newly formed subgroups. This partitions the initial population into 2 subgroups that are as distinct as possible.
Iterative steps: Recursively proceed to identify any additional early characteristics (predictors) that significantly predict the selected clinical endpoint. The recursive part of the algorithm starts over and the 2 fundamental steps (steps 1 and 2 listed above) are repeated separately for 2 newly formed subgroups. The URP-CTREE calculations proceed until no more statistically significant predictors are associated with the selected endpoint (null hypothesis cannot be rejected).
Once the clinical endpoint and predictors are selected, the algorithm will determine any significant associations without allowing any further input or bias by the investigator. Conditional inference trees can be applied to all types of regression problems, and has already been successfully used in other clinical settings with heterogeneous patient populations, like genetic marker–tumor association studies.17,18
Comparison of Statistical Methods
The recently developed URP-CTREE method is considered to directly identify more homogeneous study subgroups; however, its predictive accuracy needs to be compared against established statistical methods. Given the more continuous nature of the Total-UEMS endpoint, we compared multiple linear regressions, Least Absolute Shrinkage and Selection Operator (LASSO 19 ) with URP-CTREE. Given the binary nature of the endpoint for a 2-motor level change, we compared multiple logistic regressions and LASSO with URP-CTREE. Linear and logistic regressions are well-known statistical techniques that have been previously employed in SCI research.7-9 For the purpose of this article, LASSO can be interpreted as a multiple regression model with built-in variable selection. 19
For evaluating the accuracy of Total-UEMS prediction, the models were compared by computing root mean square error (RMSE). RMSE is a frequently used measure of difference between observed values and values predicted by a model. It is defined as the root of the squared sum of differences between observed and predicted value divided by the total sample size. 20 The URP-CTREE–based prediction for continuous outcomes is computed as the final node–specific mean. For assessing the accuracy of 2-motor level change prediction, the models were compared by computing the misclassification rate. Misclassification rate for a binary outcome is defined as the percentage of incorrect future outcome prediction based on the model, compared with the actually observed values. 20 The URP-CTREE–based prediction for binary outcomes is based on the final node–specific most likely outcome.
Following standard benchmarking procedures, 21 both measures were based on 500 bootstrap iterations. All analyses were performed in the computing environment R, 22 version 2.14.0, and based on the package party: A Laboratory for Recursive Partitioning. 23
Results
Comparison of Statistical Methods for Predicting Clinical Endpoints
The prediction accuracy (RMSE) of Total-UEMS at 6 months after SCI is based on 500 bootstrap iterations and shown in Table 1. All 3 statistical approaches provide similar median and overlapping 95% confidence intervals (CIs) for RMSE.
Root-mean-squared error as a measure of prediction accuracy for Total-UEMS at 6 months after cervical sensorimotor complete (AIS A) SCI. No statistically significant difference in accuracy between the three methods was observed.
Abbreviations: 95% CI, 95% confidence interval.
Likewise, the examined statistical methods for predicting a 2-motor level change on either side of the cervical cord within the first 12 months after cervical complete (AIS A) SCI also provide similar statistical accuracy (Table 2). The difference here is the selected clinical endpoint was a binary event. Based on 500 bootstrap iterations, Table 2 shows the 95% CI for misclassification rate. All 3 statistical approaches provide similar median and overlapping 95% CI.
Misclassification rate as a measure of prediction accuracy for 2-motor level change at 12 months after cervical sensorimotor complete (AIS A) SCI. No statistically significant difference in accuracy between the three methods was observed.
Abbreviations: 95% CI, 95% confidence interval.
Ancillary analyses for Total-UEMS at 12 months and 2-ML recovery at 6 months provided similar results in terms of median and CI across the different methods. Therefore, for reasons of clarity, this article refers to the primary analysis only.
Direct Identification of More Homogeneous Study Subgroups
Figure 2 shows the conditional inference tree (URP-CTREE) for Total-UEMS at 6 months after cervical sensorimotor complete (AIS A) SCI.

Conditional inference tree for the endpoint Total-UEMS at 6 months after cervical sensorimotor complete (AIS A) SCI (N=122), using a broad set of neurological and functional predictors assessed within the first two weeks after injury. The upper part represents the sequential splits based on early predictors (nodes 1,2,4,7); the lower part represents the achieved partition of the initial population into 5 more homogeneous subgroups, as represented by the final nodes (nodes 3, 5, 6, 8, 9). Boxplots at the bottom show sample size and distribution of the clinical endpoint within each subgroup (node).
In the example shown in Figure 2, the iterative identification of more homogeneous subgroups is based on 2 significant early neurological predictors, namely UEMS and Motor ZPP, as measured within the first 2 weeks after cervical sensorimotor complete SCI. Successive UEMS or ZPP cutoff values are indicated at the “branch points” 1, 2, 4, and 7. At each branch point, a multiple testing–adjusted P value is given, which describes the strength of the statistical association between the early predictor (UEMS or ZPP) and the endpoint (Total-UEMS). The full distribution of the Total-UEMS is revealed in the box plots within the final nodes at the bottom. The visual representation of a conditional tree is directly interpretable and easily applied in a clinical setting to determine which subjects to include, exclude, or group together for a desired study. Conversely, conventional linear regression methods will deliver an equation of the type
which quantifies how predictors are associated with the chosen endpoint, but does not provide direct decision rules for stratification of subjects into more homogeneous study subgroups.
Figure 3 shows the conditional inference tree (URP-CTREE) for the 2-motor level change within 12 months after cervical complete (AIS A) SCI. The URP-CTREE algorithm led to a partition of the initial EMSCI cervical AIS A population into 2 subgroups based on UEMS (as measured within 2 weeks after injury) and described in the final nodes (nodes 2 and 3). A multiple testing–adjusted P value is given, which describes the strength of the statistical association between the early predictor characteristic (UEMS) and the endpoint (2-motor level change). The full distribution of the clinical endpoint is revealed within nodes 2 and 3. Once again, the visual representation of a conditional tree is directly interpretable and can be implemented in a clinical setting. Conventional logistic regression methods deliver equation of the type
and do not provide direct decision rules for stratification.

Conditional inference tree for the recovering of at least two motor levels (black shading; grey shading indicating failure of achieving this specified endpoint) at 12 months after cervical sensorimotor complete (AIS A) SCI (N=103), using a broad set of neurological and functional predictors assessed within the first two weeks after injury. The upper part represents the splits based on early predictors (here only node 1); the lower part represents the achieved partition of the initial population into 2 more homogeneous subgroups, as represented by the final nodes (nodes 2, 3). Plots at the bottom show sample size and distribution of the clinical endpoint for each subgroup.
Discussion
Clinical prediction models for the prognosis of potential future outcomes as well as for the identification of subgroups of SCI patients having predictable recovery patterns are essential.1,4,24 Several attempts have been made to create clinical algorithms for the prediction of future clinical endpoints and for patient stratification in SCI.7-9 These attempts rely on the statistical techniques of multiple linear regressions and logistic regression (for which the following considerations also apply). Despite achieving in some cases excellent discrimination in prediction of future clinical endpoints,7,8 these approaches present shortcomings (see Table 3 for an overview) that may have hindered their wider application in SCI. Here, we present unbiased recursive partitioning’s conditional inference trees (URP-CTREE) as a statistical method that overcomes some of these challenges.
Summary of key differences between unbiased recursive partitioning (URP-CTREE) and multivariate linear/logistic regression models.
Comparable Prediction Accuracy
In an attempt to overcome some of the drawbacks of established regression models, we applied the statistical method of conditional inference trees from the family of unbiased recursive partitioning methods (URP-CTREE) for the first time in SCI. As a first prerogative for its consideration, we established that URP-CTREE provides equal statistical accuracy in predicting selected clinical endpoints (future outcomes) from a broad set of early clinical characteristics taken from neurological and functional assessments across a sample of cervical sensorimotor complete SCI subjects (Tables 1 and 2). In both analyses, median estimates of accuracy are similar for all 3 methods tested. Confidence intervals based on resampling techniques clearly indicate no statistical differences in accuracy across statistical methods. In general terms, there is no consensus on how to define standard reference values for accuracy (eg, correlation coefficient and the differentiation between weak, moderate, and strong); it has to be evaluated depending on the specific setting of application.
Drawbacks of Established Regression Methods
Even though linear models are powerful statistical tools they may lack specific information that may be essential for clinical applications. Multiple linear regression quantifies how a given set of early predictors associates with the mean of a future clinical endpoint and provides a numeric equation of these relationships. Linear predictor–outcome relationships are tacitly assumed and statistical interactions rarely modeled (Table 3). Especially in complex neurobiological settings, the assumption of a strictly linear relationship between predictors and clinical outcome, with no interactions between predictors do not seem sensible.1,15,16,25 In addition, focusing on just parameter estimation (eg, the mean) prevents an understanding of the full endpoint distribution within a study population (for which the mean is only its central tendency).
Even more importantly in the context of clinical trials, neither linear nor logistic regression provide a direct and objective mean of partitioning an initial, heterogeneous population into more homogeneous subgroups, leaving the need for stratification unmet. For example, a fitted linear regression model provides as a mathematical equation (see Results section) quantifying the relationships between predictors and endpoints. This equation still leaves challenges on its implementation in clinical settings, where the handling of such equations cannot be easily implemented to inform about the prognosis for an individual subject (Table 3). In addition, multiple regression showed a higher upper bound for the confidence interval (Table 1). This is likely to be due to collinearity, a situation which arises when different predictors are highly correlated, causing difficulties in model fitting and interpretation. Collinearity is an issue in established regression techniques, but prevented in URP-CTREE (and LASSO).
Conditional Inference Trees
In contrast to established regression methods, URP-CTREE does not assume linear dependence between predictors and endpoint, and it specifically puts the modelling focus on interactions between predictors 10 (Table 3). In addition, URP-CTREE has the major advantage of defining more homogeneous subgroups in a direct, data-driven manner and to reveal the clinical endpoint distribution within each subgroup 10 (Table 3). These unique features of URP-CTREE could be of value for refining the stratification of patients for future clinical trials. URP-CTREE could also be used as an explorative tool for defining sensible primary and secondary outcomes for specific subgroups.
Total-UEMS and 2-Motor Level Analyses
The specific advantages of URP-CTREE outlined above are clearly visible in the results provided by the 2 analyses performed. Figure 2 represents the application of URP-CTREE to the clinical endpoint of Total-UEMS at 6 months after SCI. The occurrence of subsequent splits along the same UEMS scale within Figure 2 strongly suggests that it cannot be assumed that the recovery between the baseline and 6 months occurs as a linear function. This is not a new finding1,15,16,25 but stresses the importance of nonlinear effects and interactions, which are readily detected by URP-CTREE. 10 However, routine regression analyses tacitly assume linearity and usually do not account for interactions. Figure 2 also clearly suggests that even within the narrowly defined patient subgroups of cervical sensorimotor complete (AIS A) there will be variability in recovery, underscoring the limited value of an AIS grade both as a stratification tool as well as its change as a sensitive measure for any subtle or meaningful therapeutic effect.1,16,25 In our sample of sensorimotor complete SCI subjects, URP-CTREE also provides a way of identifying subjects subgroups that will potentially show flooring (Figure 2, node 3) and ceiling (Figure 2, node 9) effects, which is of great relevance for the planning of clinical trials and the definition of primary endpoints.
Figure 3 represents the application of URP-CTREE to a binary clinical endpoint, 2-motor level change within 1 year after injury. The analysis suggests that cervical AIS A subjects with initial UEMS less or equal 20 (node 2, Figure 3) have a much lower probability of spontaneous recovery of at least 2 motor levels than subjects with a higher initial UEMS (node 3, Figure 3). URP-CTREE shows that selecting subjects only from node 2 of Figure 3 would provide even more homogeneous subgroups than the inclusion of all cervical sensorimotor complete subjects, 16 which would translate in a lower false positive rate for a possible clinical study based on such outcome. Nevertheless, the purpose of the 2-motor level change analysis was to demonstrate that URP-CTREE works for different types of clinical endpoints (continuous Total-UEMS and binary 2-motor level change) measured at 2 different time points (6 and 12 months after injury), and this partitioning within the population of cervical complete SCI may not always be necessary or preferred. In fact, the overall percentage of subjects recovering at least 2 motor levels within 1 year after injury (32/117; 27%) agrees with previous findings, 16 and has been suggested as a clinically meaningful primary outcome for cervical sensorimotor complete (AIS A) SCI population as a whole.
Homogeneous Subgroups
Our results indicates that URP-CTREE, while being specifically designed to identify more homogeneous subgroups within an initially heterogeneous population, does not compromise prediction accuracy when compared to established statistical regression approaches. Notably, the same conclusion is reached for 2 different clinical endpoints (continuous Total-UEMS change and binary 2-motor level change) measured at 2 different time points (Figures 2 and 3). Ancillary analysis based on Total-UEMS after 12 months and 2-motor level change after 6 months provided similar results (similar medians and overlapping confidence intervals) and further evidence for our conclusions.
We based our analyses on a sensorimotor-complete SCI population, which is clinically usually recognized as a rather homogeneous population in the context of SCI. Our analyses show that even within the narrowly defined patient subgroups of cervical sensorimotor complete (AIS A), there will be substantial variability in recovery (Figure 2), suggesting the need for a more differentiated approach. Nonetheless, we recognize that the full potential of URP-CTREE is expected to be realized in even more heterogeneous population, for example, individuals living with incomplete SCI.
Choosing Endpoints and Predictors
As with every statistical modeling approach, including URP-CTREE, the choice of the clinical endpoint is central because it directly influences (a) which predictors are significantly associated with it (step 1 of URP-CTREE), (b) where the dichotomous splits are set (step 2 of URP-CTREE), and (c) how resulting subgroups (nodes) are defined. In short, choose your clinical endpoint to assure it is appropriate to the “target” of your therapeutic intervention. The same cannot be said for early predictors; any number of reasonable data traits can be entered into URP-CTREE with the only consequence of making the correction for multiple testing (eg, Bonferroni) more stringent, but URP-CTREE will still only identify those predictors that significantly associate with the chosen clinical endpoint. URP-CTREE can handle all types of predictors and several types of clinical endpoints. 10
Limitations
Linear and logistic regression models are most useful when the relationship between early predictors and the clinical endpoint under investigation is truly linear, but this has not been demonstrated for SCI 1 or any other central nervous system disorder. In settings where this assumption holds, established regression methods are likely to outperform URP-CTREE in terms of prediction accuracy.
The present study was designed to introduce URP-CTREE and assess its value for predicting future clinical endpoints and stratifying heterogeneous populations. While the resampling technique confirms the validity of our conclusions, generalizability of URP-CTREE trees shown here should be evaluated in independent samples of subjects.
Many clinical assessments like UEMS are analyzed as sum scores of different items and often treated as continuous variables for further analysis, even though they are ordinal scales. We acknowledge that this could provide misleading results if sum scores do not represent a consistent scoring metric. Rasch analysis could provide insight into the measurement properties of commonly used clinical assessments 26 and produce a measurement scale that can be more confidently analyzed, but this is beyond the scope of the present study.
Conclusion
The results of our analysis show that URP-CTREE provides advantages over established multivariate linear and logistic regression techniques without compromising prediction accuracy. Above all, conditional inference trees are specifically designed to identify more homogeneous subgroups within an initial heterogeneous patient population. Data-driven, objective decision rules for more homogeneous subgroup identification can be created and easily implemented in clinical studies. URP-CTREE can be applied to all kind of regression problems, and could therefore be applied to a wide range of neurological disorders where the identification of more homogeneous subgroups is desired.
Footnotes
Acknowledgements
The authors acknowledge the support of the European Multicenter study about Spinal Cord Injury network (EMSCI), the International Foundation for Research in Paraplegia (IFP), the Spinal Cord Outcomes Partnership Endeavor (SCOPE), and the Clinical Research Priority Program in Neuro-rehabilitation of the University of Zurich. We appreciated the constructive comments of Linda Jones on an early draft of this article and the continuous assistance of René Koller with the EMSCI database.
Authors’ Note
Rainer Abel, Doris Maier, Martin Schubert, Norbert Weidner, Rüdiger Rupp, and Armin Curt are members of the EMSCI (European Multicenter study about Spinal Cord Injury) Study Group.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
