Abstract
Given the importance of accurately and reliably assessing disability in future clinical trials, which will test therapeutic strategies in acute spinal cord injury (SCI), we sought to appraise comprehensively studies that focused on the psychometric properties (i.e., reliability, validity, and responsiveness) of all previously used outcome measures in the SCI population. The search strategy included Medline, CINAHL, EMBASE, and Cochrane databases. Two reviewers independently assessed each study regarding eligibility, level of evidence (using Sackett's criteria), and quality. Of 363 abstracts captured in our search, 36 full articles fulfilled the inclusion and exclusion criteria. Eight different outcome measures were used to assess disability in the SCI population, including Functional Independence Measure (FIM), Spinal cord Injury Measure (SCIM), Walking Index for Spinal Cord Injury (WISCI), Quadriplegia Index of Function (QIF), Modified Barthel Index (MBI), Timed Up & Go (TUG), 6-min walk test (6MWT), and 10-m walk test (10MWT). While 19 of 36 studies provided level-4 evidence, the remaining 17 articles were classified as level-2b evidence. Most of the instruments showed convergent construct validity in the SCI population, but criterion validity was not examined due to the lack a gold standard for assessment of disability. All instruments were tested in the rehabilitation and/or community setting, but only FIM was examined in the acute care setting. Based on our results of quality assessment, the SCIM has the most appropriate performance regarding the instrument's psychometric properties. Nonetheless, further investigations are required to confirm the adequate performance of the SCIM as a comprehensive measure of functional recovery in patients with SCI in rehabilitative care. The expert panel of the Spinal Cord Injury Solutions Network (SCISN) that participated in the modified Delphi process endorsed these conclusions.
Introduction
S
While measures of impairment are commonly used as the primary outcome measure in clinical trials of SCI, assessment of disability after SCI is an important secondary endpoint. As defined by the International Classification of Functioning, Disability, and Health from the World Health Organization (WHO), disability is related to the level of “activity” (WHO, 2001). Ideally, the instrument of choice for assessment of disability in the clinical practice and research areas of SCI should be appropriate for descriptive and evaluative purposes in accordance with the framework of Kirschner and Guyatt (Krischner, 1985).
Given the significance of assessment of disability in anticipated future clinical trials, which will test therapeutic strategies in patients with acute SCI, we sought to review comprehensively the studies that focused on the psychometric properties (i.e., reliability, validity, and responsiveness) of all previously used outcome measures in the SCI population.
Methods
This systematic review included all outcome measures of disability after traumatic SCI that were published at least twice in the literature. Based on the examination of their psychometric properties including reliability, validity, and responsiveness, we sought to answer the following key question: What is the most reliable, validated, and responsive outcome measure of disability for patients with acute traumatic SCI?
Inclusion and exclusion criteria
For this purpose, we selected all original articles that examined at least one the psychometric properties of an outcome measure of disability in the setting of traumatic SCI. We included only those outcome measures of disability of which psychometric properties were examined in at least two publications. We excluded case reports, editorial articles, and meeting abstracts.
Literature search strategy
The primary literature search was performed using Medline, CINAHL, EMBASE, and Cochrane databases. A secondary search strategy included articles referred in meta-analysis, and systematic and non-systematic review articles that were captured in the primary search strategy.
The literature searches addressed publications from 1966 to April 2008. The search strategy included the following key words: “disability,” “activity,” “activity of daily living,” “functional outcome,” and “functional recovery.” Those specific key words were paired with the following Medical Subject Headings (MeSHs): “spinal cord injury,” “SCI,” “tetraplegia,” “quadriplegia,” and “paraplegia.” The literature search was limited to papers written in English only. Subsequently, the terms of each outcome measure captured using the above search tactic were paired again with the generic MeSHs including “spinal cord injury,” “SCI,” “tetraplegia,” “quadriplegia,” and “paraplegia.”
Data abstraction and synthesis
For the culling process, two reviewers (JCF and VN) independently selected the articles that fulfilled the inclusion and exclusion for each topic. Disagreements were solved by a debate and consensus between both reviewers.
A research assistant extracted the relevant data from each selected article. Subsequently, both reviewers examined all clinical studies with regard to the extracted data and, hereafter, determined the level of evidence according to Sackett and associates (2000). Using the quality criteria by Terwee and associates, every instrument of assessment of disability was examined with regard to its psychometric properties (Higginson, 2007; Terwee et al., 2007). Divergences during those steps were solved by consensus between both reviewers. The main results from each article and the reviewers' assessments were included in summary tables.
Definitions of the psychometric properties
For the purpose of this systematic review, the psychometric properties were classified according to Terwee and associates (2007) (Table 1). Content validity refers to the extent to which the items in the instrument comprehensively represent the concepts of interest (Guyatt et al., 1993). Internal consistency refers to the extent to which items in the instrument (sub)scale are homogenously correlated and, hence, measure the same concept (Terwee et al., 2007). Criterion validity refers to the degree to which the instrument measures in comparison with the criterion or “gold standard” (Furlan et al., 2008). Given that there is no well-established gold standard for assessing disability in the SCI population, criterion validity was not assessed. Construct validity is commonly divided into convergent or divergent. While convergent construct validity indicates the degree of similarity between two constructs that theoretically should be related to each other, divergent construct validity reveals how dissimilar two constructs are that in theory should not be related to each other (Furlan et al., 2008). Reproducibility refers to the degree to which repeated measurements in steady patients provide similar results (Terwee et al., 2007). Reproducibility is generally divided into agreement and reliability. While agreement reflects the absolute measurement error, reliability refers to the degree to which patients can be distinguished from each other, regardless of measurement error (Terwee et al., 2007). Responsiveness concerns the ability of a measurement instrument to detect change accurately when it has occurred (de Bruin et al., 1992). Floor or ceiling effects occur when more than 15% of examined patients reach the lowest or highest possible score respectively (McHorney and Tarlov, 1995). Finally, interpretability concerns the degree to which one can assign qualitative meaning to quantitative scores (Lohr et al., 1996).
MIC, minimal important change; SDC, smallest detectable change; LOA, limits of agreement; ICC, Intraclass correlation; SD, standard deviation.
+ = positive rating; ? = indeterminate rating; − = negative rating; 0 = no information available.
Doubtful design or method: lacking of a clear description of the design or methods of the study, sample size smaller than 50 subjects (should be at least 50 in every [subgroup] analysis), or any important methodological weakness in the design or execution of the study.
Establishment of recommendations
Based on the review information summarized in the tables, the authors answered the specific question formerly elaborated. Using a modified Delphi method, an expert panel comprised of scientists and clinicians in the Spinal Cord Injury Solutions Network (SCISN) examined the summary tables and answer to the focused question, and eventually determined the evidence-based recommendations (Reid, 1993).
Results
Literature search
Of 363 abstracts captured in our search, 36 full articles fulfilled the inclusion and exclusion criteria and were reviewed by the two reviewers. There were eight different outcome measures that were used to assess disability in the SCI population as follows: i. Functional Independence Measure (FIM) (Beninato et al., 2004; Davidoff et al., 1990; Dijkers and Yavuzer, 1999; Grey and Kennedy, 1993; Hall et al., 1999; Kucukdeveci et al., 2001; Lawton et al., 2006; Lundgren-Nilsson et al., 2006; Marino et al., 1993; Masedo et al., 2005; Nilsson et al., 2005; Roth et al., 1990; Segal et al., 1993; Yavuz et al., 1998); ii. Spinal Cord Injury Measure (SCIM) (Catz et al., 1997, 2002, 2007; Catz, Itzkovich, Agranov et al., 2001; Catz, Itzkovich, Steinburg et al., 2001; Itzkovich et al., 2002, 2003, 2007); iii. Walking Index for Spinal Cord Injury (WISCI) (Dittuno and Dittuno, 2001; Ditunno et al., 2000, 2007, 2008; Kim et al., 2007; Morganti et al., 2005; van Hedel et al., 2005, 2006); iv. Quadriplegia Index of Function (QIF) (Gresham et al., 1986; Marino and Goin, 1999; Marino et al., 1993, 1995; Yavuz et al., 1998); v. Modified Barthel Index (MBI) (Roth et al., 1990; Kucukdeveci et al., 2000); vii. 6-min walk test (6MWT) (Olmos et al., 2008; van Hedel et al., 2005, 2006); and viii. 10-m walk test (10MWT) (Olmos et al., 2008; van Hedel et al., 2005, 2006, 2008).
Our search also captured the Spinal Cord Injury Functional Ambulation Inventory (SCI FAI) as another instrument for the assessment of disability in the SCI population (Field-Fote et al., 2001). However, there was only the original publication of this outcome measure in the literature and, hence, this instrument was excluded from our systematic review.
While 19 of 36 studies provided level-4 evidence, the remaining 17 articles were classified as level-2b evidence (Table 2). Convergent construct validity (n = 23) and reproducibility (n = 14) were the most commonly studied psychometric properties. In addition, other psychometric properties were also examined, including content validity (n = 6), internal consistency (n = 3), item generation/reduction (n = 2), and responsiveness (n = 2).
SCI, spinal cord injury; MVA, motor vehicle accident; AIS, American Spinal Injury Association (ASIA) Impairment Scale; TBI, traumatic brain injury.
Using the criteria of Terwee and associates (2007), each instrument was assessed with regard to its quality based on the literature (Table 3). Generally speaking, most of the instruments showed convergent construct validity in the SCI population, but criterion validity was not examined due to the lack a gold standard for the assessment of disability. Only QIF and WISCI had the content validity examined. The most reliable instruments included FIM and SCIM, whereas many other instruments were not tested for reproducibility. While appropriate responsiveness was reported for SCIM, 6WMT, and 10 WMT, it was uncertain for QIF and WISCI. Negative ceiling/floor effects were documented for WISCI and FIM, whereas TUG, 6WMT, and 10WMT showed adequate results in the evaluations of floor/ceiling effects. All instruments were tested in the rehabilitation and/or community setting, but they were not examined in the acute care setting, except for FIM. Of note, inadequate construct validity and negative floor/ceiling effects were observed in that study, where FIM was tested in the acute care setting (Davidoff et al., 1990).
+, positive rating; −, negative rating; ?, indeterminate rating due to lack of information or poor study design /method; NA, not applicable; NR, not reported.
Discussion
Our systematic review identified 36 clinical studies that examined psychometric properties of eight instruments of disability assessment. Those included FIM, SCIM, WISCI, QIF, MBI, TUG, 6WMT, and 10WMT, which were reported in original articles of level-4 or level-2b evidence. Although criterion validity was not examined due to the lack of gold standard, several other psychometric properties were studied, including item generation/reduction, reproducibility, internal consistency, convergent construct validity, content validity, and responsiveness. While all instruments were tested in the rehabilitation and/or community setting, only FIM was examined in the acute care setting, where inadequate construct validity and floor/ceiling effects were found.
Modified Barthel Index (MBI)
The Barthel Index, which was published in 1965, was originally developed for use in rehabilitation patients with stroke and other neuromuscular or musculoskeletal disorders (Mahoney and Barthel, 1965). In 1979, following a number of revisions to the Barthel Index, the MBI was derived. The MBI is 10-item ordinal scale (range: 0 to 100) with ratings for feeding, moving from wheelchair to bed and return, grooming, transferring to and from a toilet, bathing, walking on a level surface, going up and down stairs, dressing, and continence of the bowels and bladder (Granger et al., 1979).
In our systematic review, there were only two prior studies that examined psychometric properties of the MBI in patients with SCI. A previous study evaluated the agreement among raters, but its quality varied from adequate to inadequate. Internal consistency was adequate in one prior study that used a Turkish version of the MBI. While one previous study indicated an adequate construct validity of the MBI, another study showed inconsistencies in the MBI with regard to its construct validity.
Quadriplegia Index of Function (QIF)
The QIF was originally developed to overcome the limitations of the Barthel Index in the assessment of disability of patients with tetraplegia (Gresham et al., 1986). The QIF is a 10-item ordinal scale (range: 0 to 100) that includes assessments of transfers, grooming, bathing, feeding, dressing, wheelchair mobility, bed activities, bladder management, bowel management, and understanding of personal care (Gresham et al., 1986).
Our systematic review identified five original articles where psychometric properties of the QIF were studied. While internal consistency was assessed as adequate in one prior study, reliability was examined in another study that had an indeterminate rating of quality due to a lack of information or poor study design/method. Convergent construct validity was consistently assessed as adequate in all five prior studies. However, a prior study on content validity and another study on responsiveness of QIF had an indeterminate rating of quality due to a lack of information or poor study design/method.
Functional Independence Measure (FIM)
Although FIM was developed for assessment of disability in patients with stroke and to assess the requirements for burden of care, this instrument has been widely used in the assessment of disability in spinal cord injured patients (Kirshblum et al., 2004). The FIM is an 18-item ordinal scale (range: 0 to 126) with seven levels per item (from complete independence to total assist) that includes assessment of disability in the areas of self-care, sphincter control, mobility, locomotion, communication, psychosocial adjustment, and cognitive function (Keith et al., 1987). The physical FIM subscore refers to the summed subscores for self-care, sphincter control, mobility, and locomotion items, whereas cognitive FIM subscore includes the subscores for communication, psychosocial adjustment, and cognitive function.
In our systematic review, FIM has the greatest number of publications among all instruments used for the assessment of disability in the SCI population. The reliability, internal consistency, and construct validity of FIM have been inconsistently found to be adequate. In addition, the responsiveness and interpretability of FIM were examined in a number of studies that were rated as indeterminate due to a lack of information or poor study design/method. Moreover, a negative ceiling effect, which refers to an effect whereby the instrument cannot detect changes in response on a value higher than some “ceiling,” were consistently documented in four previous studies.
Spinal Cord Injury Measure (SCIM)
While the FIM is reportedly reliable and valid as a disability assessment instrument for various patient groups, including spinal cord injured individuals, the SCIM was specifically developed for patients with SCI, and its use is gradually increasing (Catz et al., 1997; Dodds et al., 1993; Ottenbacher et al., 1996). The SCIM is an attempt to minimize some of the shortcomings observed in the FIM when applied to patients with SCI, such as the ceiling effect (Catz et al., 1997; Hall et al., 1999). The SCIM is a 16-item ordinal scale (range: 0 to 100) that includes three levels of activity (i.e., self-care, respiratory and sphincter management, and mobility) that are weighted according to their clinical relevance (Catz et al., 1997). A second version (SCIM II) with improved phrasing of some of the components was reported to be reliable and valid among individuals with SCI (Catz, Itzkovich, Steinburg et al., 2001). Given the results of a Rasch analysis enriched by critiques of experts, the third version (SCIM III) incorporated several modifications including the addition of upper- and lower-body subitems for bathing and dressing, and the addition of another item on ground–wheelchair transfer (Itzkovich et al., 2002, 2007).
Based on the results of our systematic review, all three versions of the SCIM were examined in eight clinical studies that were multicenter in two occasions but led by the same group of investigators in Israel. Internal consistency was reportedly adequate in all four studies that evaluated this psychometric property in SCIM II and III. While the SCIM I and III consistently showed adequate reliability, there are concerns with regard to the reliability of SCIM II. Convergent construct validity was adequate for the SCIM II in one prior study, but it was assessed as inadequate for SCIM III in another previous study. The SCIM I and III were also reported to have adequate responsiveness in two previous studies.
Walking Index for Spinal Cord Injury (WISCI)
The WISCI was developed more specifically to evaluate patients with SCI with respect to their walking recovery because (i) this is of great interest for most individuals with SCI during rehabilitative care; (ii) it is poorly assessed by FIM; and (iii) there was supposedly a need for an instrument that assessed walking disability in humans matching with the commonly used locomotor scale developed by Basso, Beattie, and Bresnahan (BBB) for preclinical studies of treatment for SCI (Basso et al., 1996; Ditunno et al., 2000). The WISCI I is a 19-level hierarchical scale where the levels are scored from 1 (patients can ambulate less than 10 m using parallel bars, with braces, and with the physical assistance of two persons) to 19 (patient can ambulate at least 10 m with no devices, no braces, and no physical assistance) (Ditunno et al., 2000). Based on the experience of the use of WISCI in a randomized clinical trial of Body Weight Support Training, a revision of this scale was proposed, and therefore the WISCI II became a 21-level hierarchical scale where the levels are scored from 0 (patient is unable to walk) to 20 (patient can walk without braces and/or devices and without physical assistance for at least 10 m) (Dittuno and Dittuno, 2001).
The results of our systematic review suggest that WISCI I has an adequate agreement in one prior study, but the reliability of WISCI II was not confirmed in another study. While content validity and convergent construct validity were assessed as adequate for WISCI I in one previous study, there were considerable inconsistencies among the seven other studies that examined convergent construct validity of WISCI II. In addition to the uncertainty regarding the responsiveness of WISCI II in three prior studies, ceiling effects and inadequate interpretability of WISCI II were noticed in previous publications.
Timed Up & Go (TUG)
The TUG was originally developed as a measure of balance in elderly people (Mathias et al., 1986). The TUG is a timed walking test that measures the time (in seconds) for a patient to stand up from an armchair, walk 3 m, return to the chair, and sit down (Podsiadlo and Richardson, 1991).
In our systematic review, the two prior studies on the psychometric properties of TUG in patients with SCI indicate adequate performance of this test regarding floor/ceiling effects. In one previous study, agreement of the TUG was reported as inconsistent, but the TUG showed adequate convergent construct validity.
6-min walk test (6MWT)
The 6MWT was initially used to assess cardiovascular exercise capacity in elderly patients with congestive heart failure or chronic lung disease (Butland et al., 1982; Guyatt et al., 1985; Roomi et al., 1996). This is a straightforward measure of the distance (in meters) that a patient can walk within 6 min (Butland et al., 1982).
In the SCI population, the psychometric properties of the 6WMT were documented in three previous studies that were captured in our systematic review. For this timed walking test, reliability was uncertain, and agreement was found to be inadequate based on one previous study. While two previous studies assessed convergent construct validity of the 6WMT as adequate, a third study suggested its inadequacy. Responsiveness of the 6WMT was found to be adequate based on the results of one prior study. The adequate performance of the 6WMT in terms of floor/ceiling effects was consistently reported in all three previous studies.
10-m walk test (10MWT)
The 10MWT has been primarily used as a gait measure in patients with different neurologic movement disorders including stroke and Parkinson's disease (Rossier and Wade, 2001; Schenkman et al., 1997; Smith and Baer, 1999). The 10MWT assesses the short-duration walking speed by measuring the time (in seconds) that a patient can walk a 10-m distance.
In our systematic review, reproducibility of the 10WMT was inconsistent in one previous study. However, all four previous studies that examined the 10WMT in the SCI population reported adequate convergent construct validity and appropriate performance with respect to floor/ceiling effect. In addition, this instrument was assessed as having adequate responsiveness in one previous study.
What is the most reliable, validated, and responsive outcome measure of disability for patients with acute traumatic SCI?
In our systematic review, there were eight instruments of disability assessment that have been examined in the SCI population. However, all those instruments were mostly tested in the rehabilitation and/or community setting. Only one study reported the use of FIM in the acute care and rehabilitation setting with negative evaluation in terms of its construct validity and ceiling effects (Davidoff et al., 1990). Of note, most of their spinal cord injured subjects were assessed using FIM between 2 and 4 months after SCI (or in the subacute stage of SCI) (Davidoff et al., 1990). Our systemic review provides a critique of the psychometric properties of the existing disability scales. Based on our results of quality assessment using the criteria of Terwee and associates, the SCIM has the most appropriate performance with regard to the instrument's psychometric properties. Nonetheless, the paucity of studies on validity, agreement, responsiveness, and interpretability in the setting of acute care suggests further investigations are required to confirm the adequate performance of the SCIM among patients with SCI in rehabilitative care.
Recommendations
In the Delphi process, a panel of clinical scientific experts in the field of acute SCI (including basic scientists, clinician-scientists, surgeons, rehabilitation specialists, nurses, and clinical epidemiologists) consensually endorsed the recommendation for use of SCIM III in the classification and evaluation of patients with acute SCI. However, the expert panel also recognized that the identification of a need to find a common objective means to assess the effect of surgery is not an isolated break in strategy, but reflects the general trend of modern science and medicine. Hence, there is the need for further investigations to confirm the performance of the SCIM in the acute care setting in a multi-centered trial.
Footnotes
Author Disclosure Statement
No competing financial interests exist. This work was supported by grants from the Rick Hansen Institute and Ontario Neurotrauma Foundation.
