Abstract
Background:
The clinical course of Spinal Muscular Atrophy (SMA) has been transformed by new disease-modifying therapies (DMT). Before DMT, the clinical course of SMA was marked by an inflection point between developmental improvement and degenerative functional losses. The well-established SMA classification by “type” was based upon the age and functional level at which this inflection point manifested. Following DMT, the natural history of SMA has evolved. The evolution of the SMA clinical course following DMT thus necessitates an updated means to characterize individuals with SMA.
Objective:
To address both clinical and research needs, an updated assessment should target meaningful function; apply broadly across the range of SMA; maximize the granularity of functional tiers with minimal compromise of reliability; require minimal or no specific training; and be validated with measurable consistency between both naïve and experienced clinicians, and between these clinicians and members of the SMA community.
Methods/Design:
Development of the Functional Ability Scale for Evolving SMA (EVOLVE-SMA) proceeded as an iterative scale creation and refinement process using retrospective cohorts (for design, reliability, and validity) followed by prospective assessments of real-world feasibility and reliability.
Results:
Excellent interrater reliability (ICC >0.9, p < 0.001) was established between expert and novice clinicians (physician and physical therapist) and clinicians versus SMA community members.
Conclusions:
EVOLVE-SMA is a tiered evaluation of function applicable to the new DMT-treated era of SMA. It is a simple, granular, and reliable means to assess meaningful functions that will have value to research, resource allocation, and individual clinical care.
Keywords
Introduction
Spinal Muscular Atrophy (SMA) is a recessive genetic neuromuscular disorder caused by diminished expression of the survival motor neuron (SMN) gene complex. Weakness resulting from skeletal muscle denervation affecting motor function manifests functional impairment over a broad range.1,2 The last decade has seen a remarkable transformation of the natural history of individuals with SMA: where progressive motor neuron degeneration and consequent muscle atrophy were once inevitable, now three specific disease-modifying treatments (DMT) (nusinersen, onasemnogene abeparvovec-xioi, and risdiplam) stabilize the neurodegeneration.3–5 Individuals with SMA no longer decline as they once did. When delivered early in infancy, these therapies now enable, in a dramatic manner, the emergence of normal developmental milestones.6–8
Classification systems characterizing SMA have evolved over time. Early clinical investigators considered the combination of a broad range of onset ages of SMA, the severity of disease expression, and the differences in relative incidence as evidence supporting the existence of distinct, though partially overlapping, disorders. Three eponymous titled entities gained general acceptance – Werdnig-Hoffmann, Dubowitz, and Kugelberg-Welander diseases.9–12 Following the 1995 identification of mutations in SMN, 13 this multi-disorder SMA nosology was replaced by an understanding of SMA as a single disorder manifesting across a broad range. Because of the broad range of clinical manifestations, however, prognostic and care needs led to acceptance of a new classification of three clinical “types” delineated by the same defining boundaries used for the previous eponymous categories. 14 Each of these was defined by the combination of the two pre-DMT features: the age at onset of symptoms and the highest level of function achieved. “Type 1” (or “Type I”) was defined by manifest weakness in the first six months of life in infants who never sat independently; the “Type 2” (or II) label was applied to those presenting between six and 18 months who could sit but not walk; and “Type 3” (or III) applied to those children with later onset who could walk. 14 Over time, some favored adding a “Type 0” for infants profoundly weak at birth and “Type 4” for those first manifesting weakness as adults.15,16 Others favored dividing the “types” further with letter designations “1a”, “1b”, and “1c” to match requirements for more categories with predictable prognoses. 17 Key to all the “type” classifications, however, was the slow, relentless, ingravescent course of increasing weakness after reaching a peak level of motor performance. Also important is that this and other classifications confer an implicit immutability of a categorization once assigned: individuals retain their labeled “type” lifelong even as their functional ability declines or improves.
The applicability and value of the older eponymous and subsequent “type” system of SMA classification have now been undermined by the substantial improvement in clinical course following DMT. Young patients who receive DMT before the onset of clinical degeneration gain skills without functional decline, and those treated later – after the onset of degeneration – fully or partially stabilize their losses against further decline.6,18,19 The fortunate consequence is that the well-characterized SMA-type defining “peak” transition of advancing-to-declining motor function no longer occurs. The new natural history post-DMT thus lacks the age of onset and maximum achieved motor function inflection characteristics that were the basis for the SMA “type” classification. The solution to this new natural history has been to use the nomenclature of “walker”, “sitter” or “non-sitter”, which simply does not provide enough information about the individual with SMA.
This change has led to a need – by clinicians, scientists, and the SMA community – for a new evaluation of function that is both descriptive and quantitative. 20 To be of greatest value to both clinical and research purposes, a new scale should (1) be meaningful to the diverse experience of those impacted by SMA, (2) combine maximum granularity with good and quantified reliability, and (3) apply to the full range of SMA severity and age.20–26 Ideally, the scale should also be able to be administered with a simple set of instructions that have broad cultural applicability and do not require special training to be administered.
The Functional Ability Scale for Evolving Spinal Muscular Atrophy (EVOLVE-SMA) (Table 1) was developed to address these objectives. EVOLVE-SMA offers an accessible and meaningful common language to clinicians, researchers, people with SMA, and caregivers in this post-DMT era. As an evaluation of functional ability, EVOLVE-SMA is meant to complement the two non-modifiable, fixed characteristics that affect an individual's disease severity (SMN2 copy number and age at initiation of DMT), but it is also intended to fit with common acquired features of SMA that often confound other SMA outcome measures (e.g., contracture, nutritional state, or age).
Functional Ability Scale for Evolving Spinal Muscular Atrophy (EVOLVE-SMA).
©The Children’s Hospital Colorado 2024
EVOLVE- SMA categorizes the spectrum of an individual's function into one of 12 tiers based on relevant functional abilities. (Table 1) These 12 tiers are hierarchically arranged according to important components of independence, thus establishing EVOLVE-SMA as an evaluation of the SMA experience felt to be most meaningful to patients. To avoid confusion with the present numeric “type” system of SMA classification, the new tiers are delineated by alphabetical labels “A” through “M” and the letter “I” is excluded due to similarities to the Roman numeral “I” that is sometimes used for “type 1”. The defining EVOLVE-SMA tier characteristics are independent of medical jargon with the goal that people with SMA, their medical team, and their caregivers all equally understand its levels. This paper describes EVOLVE-SMA and its iterative development and validation process, including scale psychometrics of inter- and intra-rater reliability of the scale between various clinician groups and individuals with SMA and their caregivers.
Methods and design
The development and refinement of EVOLVE-SMA are outlined in the study conceptual diagram. (Figure 1) This was conducted in two parts. Study 1 included a comprehensive literature review and a pilot retrospective study on scale development, reliability, and validity. Study 2 assessed EVOLVE-SMA's real-world feasibility and reliability by collecting self-assessments from participants/caregivers and comparing these to tier assignments from clinicians with a broad spectrum of experience.

Diagram developmental steps of EVOLVE-SMA.
Ethical considerations
EVOLVE-SMA studies were approved by the Colorado Multiple Institutional Review Board (COMIRB) (COMIRB #: 21-4830 and 23-0578). For study 1, consent or assent was not required because the study, using deidentified retrospective data, was deemed to have no appreciable risk. Data Use Agreements between the three collaborating sites were executed for data sharing. For study 2, informed consent or assent was obtained from each participant or legal guardian before the survey and rater evaluations.
Scale development
The development and evolution of EVOLVE-SMA was modeled on scales used for functional assessment of individuals having two other neurologic disorders: the Expanded Disability Status Scale (EDSS) for people with multiple sclerosis (MS) and the Gross Motor Function Classification Scale (GMFCS) created for children with cerebral palsy (CP).27,28 Common to these two other scales and EVOLVE-SMA is the goal of servicing research and clinical care needs by evaluating and classifying the current level of motor function. The functional impairment associated with MS and CP is a consequence of complex and overlapping combinations of pathology anywhere in the brain or spinal cord. SMA, in contrast, manifests weakness from skeletal muscle denervation having a general pattern of proximal greater than distal and lower more than upper extremity. This relative unidimensional expression of SMA neurologic impairment enables a more linear hierarchical-tiered structure assessment of motor function.
Initial development of the scale tiers occurred in five phases: 1) scale creation, 2) content validity through nominal group process, 3) consensus methods, 4) interrater reliability between experts and novice clinicians, and 5) establishing scale validity through correlation of EVOLVE-SMA tiers to SMA-specific clinical outcome measures. The overall process was divided into two parts; “Study 1” involved internal expert opinion and experience to refine and establish tier categories, assess reliability through clinician ratings, and establish validity through correlation with SMA-specific outcome measures. “Study 2” involved the application of EVOLVE-SMA to a broad range of the SMA community, including both self-assignment and clinician assignment and a survey of participants about the process. Study 2 aimed to assess reliability through self-versus clinician ratings and establish validity through correlation with SMA-specific demographics and Likert scale survey questions. The study phases are characterized in Figure 1.
Population/Recruitment
A retrospective cohort of subjects with SMA was established by site Principal Investigators (PI) at each of the three sites (Johns Hopkins Health System (JH) in Baltimore, Maryland; Ann & Robert H. Lurie Children's Hospital of Chicago (Lurie) in Chicago, IL; and Children's Hospital Colorado (CHCO) in Aurora, CO). The site-PI recruited team members to fulfill roles as members of the EVOLVE-SMA Working Group, who were both experienced and novice physicians and PTs raters. (Table 2).
Inclusion and Exclusion Criteria
For the second part of EVOLVE-SMA development, “Study 2”, prospective recruitment at a large SMA-specific community event targeted recruitment of the greatest possible range of age and function of individuals with SMA. A convenience sample of any person with SMN-deficient SMA aged two or older was eligible for recruitment. Consent was obtained from individuals or their legal guardians. Adults and children over 12 years completed the survey themselves; if physical limitations were an encumbrance, caregivers could aid them. Caregivers completed the study for children under 12 years of age. This age was selected due to anticipated understanding and comprehension of the survey questions; COMIRB requires a simplified assent for those <12 years, while children ≥12 can review the same consent form as their parents. At the event, clinician raters who participated in the initial development of EVOLVE-SMA contributed to assessments of community participants using EVOLVE-SMA. (Table 2)
Sample size
Based on calculating the sample size utilizing G*Power software,29,30 the recommended sample size was 163, with a level of significance set at p = 0.01 for Studies 1 and 2.
Nominal Group Process and Consensus Methods
Content validity was explored using a modified nominal group consensus method. A structured meeting of a target group knowledgeable about SMA was asked to discuss issues of the scale to reach a consensus. 31 The consensus method was undertaken following the initial application of the first version of the EVOLVE-SMA scale (9 tiers) by each institution's raters. All raters attending the focus group discussion had participated in the initial ratings. The focus group was conducted through a structured group interview guide to refine and update the scale to address gaps and limitations based on retrospective 9 tier data. This resulted in a revised EVOLVE-SMA scale with 12 tiers.
Study protocol
In the initiation of Study 1, a chart review was performed on individuals with a clinical diagnosis of SMA meeting inclusion criteria. Available relevant Physical Therapy, Physiatry, or Neurology notes before February 2022 had the following data compiled: age, gender, SMN2 copy number, initial “type” at diagnosis if applicable, specific DMT treatment and age of treatment initiation, and any available SMA-specific functional outcome measure (OM) scores including the Hammersmith Functional Motor Scale- Expanded (HFMSE), Revised Upper Limb Module (RULM), and 6 Minute Walk Test (6MWT).
Each subject was assigned a unique study ID number based on their site for all de-identified chart reviews. Each site study team consisted of the site PT expert (the site PI), two expert physicians, two expert PTs, and one novice physician and one novice PT. The site PI assigned each subject to an initial tier (“9 Tier PI Assigned Tier”). The physician and PT raters had no special training beyond that available in Table 1. Each rater independently reviewed each chart to record each participant's tier assignment on the pilot 9-tiered EVOLVE-SMA. This was compiled in a document with all raters’ data for each site. Data was then examined for reliability and validity. (Supplemental Table 1)
After expanding the scale to 12 tiers, the original retrospective cohort was re-evaluated by site PIs to enable comparisons to SMA-specific demographics and outcome measures (12 Tier “PI Assigned Tier”).
Reliability and validity
12-tier EVOLVE-SMA reliability
For the final phase of Study 1, ten mock patient cases were sent to all clinician raters across the three sites to establish interrater reliability among all clinicians and site-specific reliability on the updated 12-tier EVOLVE-SMA. The mock cases were modeled from actual patient cases in a format similar to the chart reviews, with details on demographics, including SMA-specific demographics, SMA OMs, and personal and environmental factors.
Survey evaluating EVOLVE-SMA
For the second portion of EVOLVE-SMA development, Study 2, SMA community member volunteers sequentially answered four parts of a survey, comprising: 1) questions about demographics and SMA characteristics (current age, gender, where they live), diagnosed SMA type, number of SMN2 copies, DMT treatment status, and surgical history of the spine and lower extremities; 2) 12 questions about functional ability corresponding to defined EVOLVE-SMA tiers (Table 1); 3) Self-assignment to EVOLVE-SMA tier after looking at a handout of Table 1, with instruction to record the tier that best described their “highest functional ability performed within the last four weeks of survey administration” (This was labeled the “Self-Assigned Tier), and finally 4) two questions that probed their understanding, opinions, and perceptions of EVOLVE-SMA. In the days following the SMA community participation study day, one of the Principal Investigators (MMB) used the participant's responses to the 12 functional ability survey questions to assign a tier level labeled the “Questionnaire-Derived Tier.” This tier assignment was generated from the participant's highest reported functional level by a stepwise analysis of the survey questions they answered, defining from tier “A” level downwards to identify the highest tier where actual functional ability was recorded.
The survey was web-based and accessible via a laptop, tablet, or smartphone webpage using Likert sliding scales or radio buttons for accessibility. The survey was provided through CHCO's web-based platform, REDCap (Research Electronic Data Capture), which collected and managed study data using REDCap electronic data capture tools. 32 REDCap is a secure, web-based application designed to support data capture for research studies, providing 1) an intuitive interface for validated data entry; 2) audit trails for tracking data manipulation and export procedures; 3) automated export procedures for seamless data downloads to common statistical packages; and 4) procedures for importing data from external sources. The entire survey took less than 15 min in total to complete. A non-scoring study team member was available to answer general questions during the survey, but participants answered the survey independently without intervention.
After participants completed the survey, two clinicians (Raters #1 and #2) independently evaluated participants to assign a tier. Although the clinician raters knew many participants’ identities based on personal interaction, they were blinded to the participant self-assessment and their co-rater's assignment. Clinicians could ask specific questions about the participants’ functions and request demonstrations of different functional skills from EVOLVE-SMA. Each clinician rater was randomly assigned a Rater number (1–25) to differentiate for later analysis, raters were randomly assigned as Rater #1 or #2 for each participant throughout the data collection. Before the administration of EVOLVE-SMA at the community-based event, the clinician raters had received a brief general pre-training refresher on the EVOLVE-SMA study flow but no specific instruction on defining characteristics of the tiers.
12-tier EVOLVE-SMA validity
Construct validity and convergent validity were established in “Study 1” using nominal group processes as described above and by correlating “PI Assigned Tier” with demographics and SMA-specific functional OMs scores, including HFMSE, RULM, and 6MWT. For Study 2, participant SMA-specific demographics (such as SMN2 copy number, assigned “type,” etc.) were utilized to correlate to “Self-Assigned Tier.” Likert scale questions were included at the end of the survey to gather information on the participants’ opinions and perceptions of the EVOLVE-SMA assessment. The first question asked participants, “Do you feel that one of the letters of the scale accurately describes your (or the individual with SMA for whom you are answering) current abilities?” A score of zero represented “not at all,” and a score of ten indicated “very much so” and represented their perception of their current level of function. The second question asked the participants, “Do you like the new functional scale for SMA.” A score of zero represented “dislike,” and a ten indicated “like.”
Statistical methods
All data was inspected for accuracy and completeness. A descriptive analysis of means/medians and proportions (SPSS Statistics version 29.0.0) characterized overall study demographics and each study's participants on EVOLVE-SMA. 33 Inter-rater reliability was determined by the Intraclass Correlation Coefficient (ICC) to assess the degree of agreement. Analysis of sub-group reliability across expert physicians, expert PTs, all novices, all experts, all physicians, and all PTs at each site was performed on the initial 9-tier EVOLVE-SMA. ICC values between 0.75 to 0.9 indicate good reliability, and >0.9 indicate excellent reliability. 34 Interrater reliability across all clinician raters at all sites using the final 12-tier assessment was established through the mock cases via ICCs. The distribution of each institution's retrospective cohort tier assignments on 12-tier EVOLVE-SMA was performed. Analysis of sub-group reliability between the two raters, one of the two raters randomly selected “Rater Assigned Tier” versus the participant's “Self-Assigned Tier,” and “Self-Assigned Tier” versus “Questionnaire-Derived Tier” were performed. Spearman's rank correlation coefficients (r) were used to assess convergent validity between the participants’ SMA-specific OMs and tier assigned by the site's PI (“PI Assigned Tier”) on EVOLVE-SMA. The relationships were described as weak (r: 0.10 to 0.39), moderate (r: 0.40 to 0.69), strong (r: 0.70 to 0.89), or very strong correlation (r: > 0.90). 35 The Kruskal–Wallis test was applied to evaluate the group differences between demographic information and EVOLVE-SMA tiers: “PI Assigned Tier” (study 1) and “Questionnaire-Derived Tier” (study 2). Post-hoc comparisons were done using Dunn's method with a Bonferroni correction for multiple tests. For all tests, p < 0.05 was considered significant. Descriptive statistics were reported for the Likert survey questions.
Results
Pilot EVOLVE-SMA development (9-tier scale)
The initial draft anchors for EVOLVE-SMA tiers were based on literature review, experience with established SMA classification systems, features of typical normal functional development, data from SMA-specific outcome measures,36–38 and three authors’ (MMB, KJK, TOC) clinical experience of the range of functional abilities achieved by individuals with SMA. Community engagement was solicited by providing drafts to a broad spectrum (age and functional level) of people with SMA and their caregivers to ensure EVOLVE-SMA was inclusive of the population. Development focused on motor function features that would be important to functional independence across the spectrum experienced by a broad age range of individuals with SMA. Functional movements (like crawling and standing) that include proximal strength in both the upper extremities and hip girdle were included. An important goal guiding design was the creation of a tiered sequence that spanned the broad spectrum of functional abilities and age in those with both DMT-treated and untreated SMA to allow for international adoption of the scale in the future. Additional attention was given to crafting scale items that minimized the possibility of qualification to a higher item while failing to meet criteria for a lower-level item. At this review stage, a decision was made to introduce a minimum age for the scale – a small compromise of the initial goal that the assessment applies to the whole population of individuals with SMA. Based on the functional tiers and expected normal developmental milestones, the scale was targeted for those who were ≥ 2 years old to avoid confounding variables due to the timeline of the emergence of motor milestones.
Nominal group process and consensus methods for EVOLVE-SMA development
The larger EVOLVE-SMA working group all met to discuss the initial 9-tier draft. Verbiage and components of different levels were adjusted to describe the tiers using non-technical language equally accessible to patients, caregivers, and clinicians with varying experience levels. Consensus that the tier definitions well described the functional impairment of individuals with SMA was achieved among all clinician raters. Adjustments to medical or therapy jargon in tier definitions were made, and graphic animation of figures added, for additional clarity. Adjustments were also made to the “not allowed” descriptive criteria within each tier to be more relevant to the compensations sometimes used by people with SMA to surmount functional impediments (e.g., orthotics, assistive devices, or use of arms for additional support).
The scale developers and the working group were interested in creating category criteria that would distribute the current SMA populations into as many evenly sized tiers as possible without compromising other goals of the assessment. Three new levels were added to the original nine tiers by (1) splitting the highest level to differentiate individuals able to “step up a curb unaided by any hand support,” (2) splitting the lowest tier to discriminate between two levels based on requirement for head support in independently controlled motorized wheelchair use, and (3) adding a new intermediate level establishing Tier D, separating those able to “walk in the community with aids” from those able only to “take a few steps” within the home. The focus group felt the additional tiers better delineated and represented the functional abilities of all individuals with SMA. Though not initially conceived in this manner, the developers realized at this point an unintentional but worthwhile clustering of the assessment into three general domains of function: at the strongest end (levels A-F) were items involving ambulation/mobility, the intermediate level (G) involved seated mobility and static items including floor transfers and static standing, and the weakest levels (H-M) involved static or supportive sitting or lying. These domains are slightly different than nomenclature of “walker”, “sitter” or “non-sitter”, providing an intermediate category between walker and sitter, and combining sitter and non-sitter into the last domain.
Characteristics of study participants and evaluators
Retrospective Patient Characteristics (Study 1)
A total of 186 retrospective charts were evaluated between the three sites (Site 1: 74; Site 2: 48; Site 3: 64). The mean age of patients was 15.8 years (median: 13; range 2–70 years). The breakdown in initial SMA “type” using the traditional nomenclature showed: Type 1: 18.3% (34); Type 2: 40.9 (76); Type 3: 37.6 (70); and pre-symptomatic: 3.2% (6). Distribution of SMN2 copy number from clinical lab reports was conventional for the prevalent population attending academic neuromuscular clinics: 3 copies 59% (110), 2 copies 20.4% (38), 4 copies 16.1% (30), and 5 + copies 1.6% (3). No enumeration of SMN2 copy number was available in five. Nomenclature of “walker”, “sitter” or “non-sitter” was not captured from chart reviews. Patient demographics and SMA-specific demographics are described in Table 3.
Combined Demographics
Community Participants’ Demographics (Study 2)
Participant demographics and SMA-specific characteristics are shown in Table 3. While 171 participants completed the survey, three had incomplete data, yielding a total cohort of 168 analyzed participants. Slightly more females (56%, n = 94) than males completed the survey. The average age of participants was 21.6 (SD: 16.3, range 2–66) years. Of those surveyed, 57.7% (97) of the participants were ≥12 years old. Seventy-one (42.3%) caregivers completed the survey. Patient demographics and SMA-specific demographics are described in Table 3.
Clinician Characteristics
The three sites included 19 clinicians, including expert physicians and PTs experienced in SMA and physicians and PTs novice to SMA. Six neurologists and three physiatrists were included as physician raters. See Table 2 for inclusion and exclusion criteria for clinicians and a description of “experienced” versus “novice.” Clinician characteristics, including practice and experience in their discipline and with SMA, are summarized in Table 3.
Distribution of EVOLVE-SMA assigned tiers (Studies 1 & 2)
The proportion of patients assigned to the “PI Assigned Tier” from the retrospective study 1 and to the “Self-Assigned Tier,” “Questionnaire-Derived Tier” and clinician raters from study 2 on the EVOLVE-SMA are shown in Figure 2. (Numerical data, including pilot 9-tier assignment in study 1, in supplemental Tables 2 and 3) The distribution across the 12 tiers shows every tier was represented, with the largest being in tiers “H” and “J” for all cohort assignments. This may reflect the adult populations who received DMTs well after the onset of weakness and disease progression. Comparing the mean breakdown for the different assignments in “Study 2” showed the “PI Assigned Tier” and “Questionnaire-Derived Tier” means were slightly larger than the “Self-Assigned Tier” (7.1 and 7.2 verse 6.5). This may suggest small differences introduced by asking questions about each tier individually, which may, in the future, be improved with additional instructions to participants. (Figure 2) The distribution of tier assignments on EVOLVE-SMA was relatively uniform for studies 1 and 2. For study 1, the proportional size of tiers ranged from 4.8 to 14.5%. For study 2, self-assignment showed 2.4 to 20.8% tier proportional sizes. (Supplemental Tables 2 and 3). The breakdown of “types” and SMN2 copy numbers within each of the 12 tiers demonstrated that EVOLVE-SMA captures diversity among individuals with SMA in these cohorts, shown in Table 4.

EVOLVE-SMA cohort assignments. a. PI Assigned Tier; b. Questionnaire-Derived Tier; c. Self-Assigned Tier; d. Rater Tier (using Rater #1 assignments).
EVOLVE-SMA Tier Assignment Breakdown by SMN2 Copy Number and Initial SMA Type
Reliability and validity
Reliability of Retrospective Chart Review using 9 Tier EVOLVE SMA Scale (Study 1)
The results provided excellent support for the overall reliability of the tier assignments as indicated by ICCs >0.9 (p < 0.001). (Supplemental Table 1).
Mock Case Reliability (Study 1)
Following refinement of EVOLVE-SMA, ten mock cases were shared among all clinician raters at the three sites. Using the new 12-tiered EVOLVE-SMA, ICCs for all raters was 0.997 (p < 0.001), indicating excellent reliability among the mock cases in Table 5.
Interrater reliability 12-tier EVOLVE-SMA for all cohorts.
*Two-way random consistency.
Inter-Rater Reliability between Clinicians (Study 2)
Two independent clinician rater assessments followed the participant's recorded self-assessment using EVOLVE-SMA. A total of 11 raters with varying levels of experience participated in assessments and were assigned Rater 1 or 2 at random. The interrater reliability between raters showed excellent reliability (ICC 0.997, 95% Confidence Interval (CI) 0.996–0.998, p < 0.001). Comparing all of those listed as Rater #1 (Rater 1 group) to the participant self-assignment showed excellent reliability (ICC 0.961, CI 0.947–0.971, p < 0.001), and to Rater #2 (Rater 2 group) also showed excellent reliability (ICC 0.959, CI 0.945–0.979, p < 0.001). The ICC and CI for Rater Groups 1 and 2 were similar. Table 5 shows the breakdown of each rater group to the “Questionnaire-Derived Tier.”
Inter-Rater Reliability Between Participants’ Self-Assignment and Clinician Rating (Study 2): Comparison of the “Questionnaire-Derived Tier” to the “Self-Assigned Tier” demonstrated excellent reliability (ICC 0.962, CI 0.947–0.973, p < 0.001), as presented in Table 5.
Retrospective Construct (Convergent) Validity (Study 1)
12-tiered EVOLVE-SMA scores of participants (“PI Assigned Tier”) were compared to OMs scores to ascertain convergent validity using Spearman's rank correlations. (Supplemental Table 4) The relationships that were described as strong (r: 0.70 to 0.89) included EVOLVE-SMA versus RULM Total score (right and left side scores added together) and 6MWT. The relationships that were described as very strong (r: > 0.90) included EVOLVE-SMA versus HFMSE, RULM on the right arm, and RULM on the left arm. Of note, the 12-tier EVOLVE-SMA and the 9-tier EVOLVE-SMA had a very strong correlation of 0.994 CI within 0.991 to 0.995, p < 0.001.
Community Construct (Convergent) Validity (Study 1 & 2)
EVOLVE-SMA tiers were compared to SMA-specific demographics and characteristics to ascertain convergent validity using Kruskal–Wallis test. (Supplemental Table 4)
Study 1: There was no significant difference in gender across 12 EVOLVE-SMA tiers, X2 (1), N = [186] = 1.120, p = 0.209 (study 1). There was a significant difference in SMN2 copy numbers across 12 EVOLVE-SMA tiers, X2 ([4]), N = [186] = 35.442, p=<0.001. Post-hoc comparisons using Dunn's method with a Bonferroni correction for multiple tests indicated that the mean rank of 4 SMN2 copy numbers was significantly closer to Letter A than that of 3 SMN2 copies, p = <.000, of 2 SMN2 copies, p = <.000, and unknown SMN2 copies, p = <.007. However, there was no significant difference between the rest of the copy number comparisons. There was a significant difference in initial SMA type across 12 EVOLVE-SMA tiers, X2 ([3]), N = [186] = 91.019, p=<0.001. Post-hoc comparisons indicated that the mean rank of presymptomatic type was significantly closer to Letter A than that of Type I, p = 0.001, and of Type II, p = 0.017. Additionally, the mean rank of Type III was significantly closer to Letter A than that of Type I, p = <.000, and of Type II, p = .000. However, there was no significant difference between the rest of the initial SMA Type comparisons.
Study 2: There was no significant difference in gender across 12 EVOLVE-SMA tiers, X2 (1), N = [168] = 6.712, p = .010. There was a significant difference in SMN2 copy numbers across 12 EVOLVE-SMA tiers, X2 ([4]), N = [168] = 24.280, p=<0.001. Post-hoc comparisons indicated that the mean rank of 4 SMN2 copy numbers was significantly closer to Letter A than that of 3 SMN2 copies, p = <.000, of 2 SMN2 copies, p = <.000, and unknown SMN2 copies, p = <.007. However, there was no significant difference between the rest of the copy number comparisons. There was a significant difference in initial SMA type across 12 EVOLVE-SMA tiers, X2 ([4]), N = [168] = 53.435, p=<0.001. Post-hoc comparisons indicated that the mean rank of Type III was significantly closer to Letter A than that of Type I, p = 0.007, and of Type II, p = 0.01. However, there was no significant difference between the rest of the initial SMA Type comparisons.
Scale satisfaction survey questions (study 2)
Two 10-point Likert scale survey questions assessed SMA community satisfaction with EVOLVE-SMA. The first question asked participants, “Do you feel that one of the letters of the scale accurately describes the current ability of you or the individual with SMA for whom you are answering?” The mean response was 6.9/10 (mode 7, range 0–10). A score of >7 points was reported by 64% (107) of participants, indicating the scale “very much so” represented their function. Eleven percent (19) of participants gave the question a score <5/10, indicating the scale did not fully represent their function. One participant commented, “It describes me pretty well.” For higher functioning individuals who can walk community distances but have problems with longer distances, this assessment may capture their function better. One caregiver stated: “She can walk independently, but we use a manual wheelchair for long distances.”
Answers to the second question, “Do you like the new functional scale for SMA?” averaged 7.1/10 (mode 5, range 0–10). A score of >7 points was reported by 61.6% (103) of participants, indicating they “liked” the scale to some degree. Nine percent (15) of participants gave the question a score <5/10, indicating they disliked the scale. When reviewing the comments, one participant said, “I think this scale would be much more accurate and specific than being labeled a type 3.” Another reported, “I like that the “letter” stages are relatively close to each other in terms of function. It seems more accurate and better describes what an individual is capable of.”
EVOLVE-SMA feasibility (study 2)
An average time of less than 15 min to complete the entire survey – including the serial letter-specific questions of the EVOLVE-SMA, the Likert of scale satisfaction, and demographic items – demonstrates the feasibility of reading the descriptors and selecting the appropriate tier on EVOLVE-SMA within a clinical setting visit. Of the 100 participants with SMA who could complete the survey, only three physically needed support from a caregiver to answer the survey. All clinicians participating in retrospective and in-person assessments could apply the EVOLVE-SMA descriptors correctly to participants with minimal training.
Discussion
We describe the development, reliability, and validity testing of a new functional motor assessment, EVOLVE-SMA, designed to characterize hierarchically the broad spectrum of functional motor abilities affected by individuals with SMA. EVOLVE-SMA categorizes SMA-associated functional abilities with improved granularity compared to the historical “types” with excellent reliability of assessment between all evaluator groups. The development of this new assessment is a response to the remarkable changes in the natural history of SMA following DMT. Most people with SMA and their caregivers report that EVOLVE-SMA accurately describes their functional ability. Together, these characteristics suggest that EVOLVE-SMA can function as both a classification scale and an assessment of function after additional research and potentially be of value for both research and clinical care needs.
EVOLVE-SMA is designed for the current and future era of increased access to SMA DMTs that should, over time, result in an improving level of function for the entire SMA population. But complete resolution of SMA-related functional impairment in the population is likely unobtainable, for many reasons. 39 For the proximate decades we can expect that the generation of individuals with SMA, many having received DMT only after the onset of manifest weakness, will continue with stable or near-stable functional weakness. There are also, unfortunately, individuals with SMA whose DMT treatment is likely to be held up by delayed diagnosis or limited access to care. Worldwide, DMTs are approved inequitably across the spectrum of disease. 40 Another factor limiting the response to DMT is that many infants manifest features of irreversible motor neuron degeneration before birth.41,42 The natural history of the approximately 60% of individuals born with severe SMA genotypes who receive DMT soon after birth is yet to be fully understood, as the treated cohort is still young and further growth and development of this group will likely reveal new phenotypes.6,43 We also do not know if the available DMTs, as presently given, fully resolve the process of neurodegeneration. Notwithstanding these limitations, the aggregate functional abilities of the SMA population should improve over time, as a greater proportion receive DMT before, or early in the course after, the onset of degeneration. Because EVOLVE-SMA is easily and inexpensively used without special training it is uniquely suited to describe and document long-term trends in functional changes in SMA and related disability. EVOLVE-SMA is also well suited to assess the functional impact that might result from changes in policy, the regulatory environments, and new therapies. Population-based studies of EVOLVE-SMA may assess the economic impact of functional tier on health care costs. This monitoring would be enhanced if the EVOLVE-SMA tier assignments (as were SMA “type” designations in the past) become widely adopted for descriptive classification in routine clinical care.
Where EVOLVE-SMA fits in the array of other available assessments
SMA has been the subject of a vast amount of clinical research, for which a wide range of scales that serve multiple purposes have been developed. These scales are of several general categories:
“Classifications” mostly sort conditions according to nosologic distinctions such as genetic cause; their value in predicting clinical or research outcomes varies with the specifics of those distinctions.14,44,45 The “Type” classification scale that parses the clinical spectrum of untreated SMA was useful for prognostic and care needs and, in clinical trials, has been incorporated into the inclusion/exclusion criteria to fit study outcome measures. However, in the modern DMT era of SMA, its reliance upon a developmental peak of function and defined age of onset does not apply. This study's poor correlation between “type” and EVOLVE-SMA tiers demonstrates the diminishing relevance of the “type” classifications. Pure classification scales also imply an immutable quality that is counter to the desired improvement intended in therapeutic development. They have inherent limitations when assessing the sensitivity to change in clinical or functional outcomes. These limitations stem from their overall design that often focuses on categorizing abilities rather than measuring incremental progress. Ordinal scales of function have been the primary outcome of clinical trials used to establish the efficacy of DMT. Scales that address function are often called CLINROs (clinician-reported) or PERFOs (performance outcomes). These scales –most prominently among them the Hammersmith family, Motor Function Measures, RULM, and CHOP-INTEND– all rely upon scores assigned by trained clinical evaluators.36,37,46–48 Their use in clinical care outside of research is limited by considerations of cost, time, patient cooperation, the requirement for special training of evaluators, and in SMA the frequent presence of disease-related complications (e.g., scoliosis, contracture, or pulmonary limitations) that confound their value as an assessment of SMA-associated skeletal muscle weakness. Ordinal scales also face a conceptual hurdle in establishing a minimum threshold of change that is meaningful to the experience of SMA – a threshold ambiguity that undermines the clarity of regulatory agencies’ assessments. Patient-reported outcome measures (PROMs) are particularly valuable in research to assess unmeasurable or difficult-to-measure patient experiences. They are typically assessed at specific times in a protocol to reduce variability and target a range of patient experiences or opinions, from the functional ability to the perceived balance of burden/benefit of therapies. PROMs can be crucial to regulatory agencies’ decisions, but grounding results to objective findings or understanding the sources of variance is not straightforward. PROMs are not suitable for all ages, to those with cognitive limitations, and are not easily validated between patients and caregivers or between different caregivers.49–53 Other Assessments used in clinical trials, such as timed tests, strength outcomes, and physiologic or biochemical measures, have similar limitations of training, specialized equipment or conditions, and difficulty relating to meaningfulness. Many target smaller groups within the wide range of SMA phenotypes.54–57
EVOLVE-SMA combines elements of many of these forms of scales. While in practice assigned tiers will likely be relatively fixed after achieving a developmental plateau; changes, declines, and advances remain possible. Declines due to accumulating downstream complications of deformity, respiratory compromise, and nutritional status are a major concern of clinical care, against which stability of function is the new appropriate target of therapy for adults with SMA. Improvements after achievement of that plateau have been described in new muscle-based therapies. 58 Further studies will be needed to understand the role of EVOLVE-SMA in monitoring for this. The ease of administration, measurable reliability, and assessment of functional qualities that are each associated with level of independent functioning combine to create a new form of assessment. EVOLVE-SMA can provide a tool that best characterizes individuals with SMA, and the population of individuals with SMA in this DMT era. EVOLVE-SMA was also conceived to support longitudinal trend analysis, patient stratification, and resource planning.
The relationship of EVOLVE-SMA to other factors influencing SMA phenotype
The poor relationship between “type” and function in the post-DMT era has required recent clinical trials inclusion criteria to further define SMA “type” with functional ability descriptors, e.g., “non-sitter type 2/3”. 58 In the population used to develop EVOLVE-SMA – which included a mixture of individuals treated early before much neurodegeneration had occurred and late after the onset of a wide range of disease progression – the relationship of “type” to the level of functional impairment is poor. The diverse spread of participants across all 12 tiers who self-identified as having been previously diagnosed with all “types” of SMA marks the evolution of SMA phenotype severity. (Table 4)
While “type” classifications no longer well characterize individuals and populations with SMA, two characteristics have emerged as immutable factors that affect phenotype severity, and thus can be properly used to classify SMA: SMN2 copy number and the age at initiation of DMT. SMN2 copy number demonstrates an inverse relationship to phenotype severity.13,59 Age at initiation of DMT has life-long impact on function related to the pre-treatment magnitude of neurodegeneration. However, because our mixed cohort included a highly variable age at the initiation of DMT, with a full range of acquired confounders of function, the relationship of SMN2 copy number as an independent predictor of EVOLVE-SMA tier was diminished. (Supplemental Table 4) This discrepant finding highlights the changing landscape for SMA. As the population of early DMT treatment increases, the residual variation between SMN2 copy number and EVOLVE-SMA tier may reveal other influences on the phenotype severity that have potential therapeutic value.
Relationship of EVOLVE-SMA to other SMA-specific outcome measures
Predictable from the authors’ initial anchors used in the development of EVOLVE-SMA, HFMSE scores and tier are strongly correlated (r > 0.96). Somewhat unexpected was the correlation of RULM to EVOLVE-SMA tier (r > 0.89) because EVOLVE-SMA tiers do not focus on upper extremity function. Previous research has shown a range in the strength of correlation between RULM and HFMSE. 60 Further exploration of the relationship of SMA-specific functional scale scores to EVOLVE-SMA tiers may enable the use of the different tiers as anchors in establishing minimal clinically important differences for the SMA-specific OMs, leading to a streamlined approach to assessment in the clinical or research setting.
Limitations, insights, and opportunities for improvement
The development of EVOLVE-SMA necessitated some constraints that potentially impact broader applicability. The initial retrospective stage of development was at three centers of excellence in SMA care and research, which may introduce some bias. All three sites treated children, but only two treated adults with SMA. While the proportion of DMT-treated patients (89.2%, n = 166) at these sites may have been higher than the nation's population, an increasing prevalence of DMT treatment in the United States will close this gap. 61 Each site participated in the initial DMT clinical trials and had a broad range of phenotypes and ages at their sites based on clinic size and urban location. The combined distribution of assessed tiers at these sites and the prospectively evaluated cohort at a national SMA family conference (Figure 2) reflects a best, but unknowably accurate, baseline estimate of the present functional impairment of the U.S. SMA population. The prospective cohort may be biased for access to resources, DMTs, or interest in research participation as participants that attend a large research/advocacy meeting may have greater connection to the SMA community and its resources.
A concern and consideration in this scale creation was to represent a hierarchical array of functional difficulty. This concern re-emerged when 14% (23) of the community participants self-reported a higher tier ability combined with an inability to perform a lower tier on the survey serial questions of functional ability. Reviewing these participants’ data, the reports of 16 participants were thought likely to be inaccurate responses as a consequence of context errors (e.g., stating they can walk but in the last 4 weeks haven’t sat in a wheelchair lacking head support) or related to variability in function. Of the remaining seven (4%) community participants with higher ability over lower inability responses, four reported independent walking in the community (Levels A and C), and three reported household ambulation (Level D) while they were unable to fulfill criteria for the much lower 4-point position and crawl tier F. These seven, all adults, demonstrate how upright posture and balanced walking are possible with minimal proximal muscle power. Some people with SMA walk using self-stabilizing positions while transitioning motor tasks, which is more challenging. The 4-point position and crawling tiers may be functionally discarded when alternative means of mobility are more efficient or may be culturally inappropriate for adults to perform. On the other hand, the 4-point position and ability to crawl short distances may become meaningful in the event of a fall and the ability to recover post fall with greater independence. Consideration of cultural implications of crawling should also be further explored as EVOLVE-SMA is implemented and utilized in different countries. As EVOLVE-SMA is used in the future, improved tier-assignment instructions will reinforce scoring to be assigned at the highest level of functional ability achieved.
In the course of EVOLVE-SMA development, we learned that in-person evaluations and questioning of functional ability improved reliability over assessments from retrospective chart review, and from that, we identified how prospective use of the scale reduces documentation bias. In research circumstances with good therapist and clinician documentation, and when used with this caution, we believe there is sufficiently good reliability for this form of assessment that future retrospective or prospective studies of this structure may yield valuable insights.
In the community-participant stage of EVOLVE-SMA development, a discrepancy emerged in comparison of the survey “Self-Assigned Tier” to the “Questionnaire-Derived Tier” that was based on the participants’ answers, even though interrater reliability was excellent (ICC 0.96, CI 0.95–0.97, p < 0.001). (Table 5) The issue might stem from the ambiguity in the survey question instructions, particularly around items that require substantial effort or are self-limited out of safety concerns. We learned from conversations with participants that their highest level achieved in the past four weeks may be different from what they chose to do daily, even though they are physically capable of fulfilling the defining tier threshold. We did not do specific interviews probing these observations but propose to evaluate the improvement of the ambiguity between the different interpretations that emerged from the questions “Are you able?” and “Have you done?” by substituting the introduction, “In the last month, have you….” Additional work to assess translations of EVOLVE-SMA will be worthwhile for international distribution. In addition, we believe that clarifying assignment instructions to start at Tier A, moving stepwise down, will help mitigate any element of survey fatigue.
Future studies of EVOLVE-SMA will be valuable. EVOLVE-SMA is not yet complete, as future studies are needed to determine how it can best work as a CLINRO or PERFO for short-term treatment response studies. Specific minimal clinically important difference formal studies 62 will be important to establish the statistical meaningfulness of the scale itself to parallel to the individual intrinsic meaningfulness of tier functional abilities.26,63 Focus groups or surveys of individuals with SMA across the spectrum of disability could generate lists of day-to-day tasks that are within the capabilities of individuals at each EVOLVE-SMA tier level. Studies that explore the correlation between SMA-specific outcome measure scales at each item level to EVOLVE-SMA tiers may provide a streamlined approach to functional assessments in the clinical or research setting. Impact on clinical management, correlation of EVOLVE-SMA tier to care needs, and risk of complications may facilitate a more individualized approach to healthcare needs and resources. This would be similar to how the GMFCS and EDSS provide prognostic data for clinicians and families.
Monitoring change over time to look at the sensitivity of EVOLVE-SMA will be critical to understanding its use. Understanding ceiling and floor effects of the scale over time may help guide selection of additional norm-referenced assessments that may be beneficial to be used in collaboration. Additionally, focus groups targeting the experience with EVOLVE-SMA of both industry partners and SMA community members from a range of nations and cultures may further its broad relevance as SMA care evolves. After appropriate translation, utilization of EVOLVE-SMA may help identify the role of resource allocation and population burden internationally.
Conclusion
EVOLVE-SMA reliably assigns individuals with SMA, two years and older, to one of 12 functional ability tiers distributed across the broad range of SMA. The scale is simple, easy to understand, and paper-based; allowing it to be reliably used by clinicians across various specialties and experience levels and by individuals with SMA and their caregivers. Because it identifies levels of an individual's motor ability that impact independent functioning, it was well received as representative of their function by members of the SMA community.
EVOLVE-SMA is well-positioned to address many of the needs of the new DMT era. As an evaluation of functional ability, it provides improved clarity of an individual's function compared to the classical “types.” EVOLVE-SMA has the ability to track the new natural history in this DMT era; the hope would be that EVOLVE-SMA would serve to support care planning and goal setting for rehabilitation professionals as a way to characterize individuals with SMA. As a research instrument, it can enable targeted research of SMA sub-groups for stratification designs to enhance statistical power or possibly as an outcome measure itself. The ease of assignment without specialized training or equipment opens opportunities for economic research to evaluate costs and policies regarding SMA care. The distributions of functional ability established as part of this development offer a baseline for future comparisons of SMA cohorts. Equally important to these broader research goals, EVOLVE-SMA has the potential to improve individual clinical care. EVOLVE-SMA enables a common language among clinicians and researchers that supports characterization and further understanding in a new era of DMT-treated SMA.
Supplemental Material
sj-docx-1-jnd-10.1177_22143602251405346 - Supplemental material
Supplemental material, sj-docx-1-jnd-10.1177_22143602251405346
Footnotes
Abbreviations
Acknowledgements
We thank all the participants and families who completed the survey at the community-based event. Thank you to Cure SMA for supporting the completion of the survey at the Annual Conference. We wish to acknowledge members of the EVOLVE-SMA Working Group for their invaluable support and expertise:
ORCID iDs
Authors’ contributions
MMB conceived and initiated the study design with KJK, while TOC and SA assisted with implementation and support within their specific sites. MMB, KJK, and TOC collaborated on creating and conceiving the purpose for a new assessment and the initial tier scalers. MMB and the EVOLVE-SMA Working Group (all authors) were responsible for data collection and scale implementation. MVO assisted with being the non-blinded study support during survey collection. The following study team members assessed prospective participant tier assignment following self-assignment: MMB, TOC, LB, JN, KJK, TC, CK, RP, ASc, ASt, and JT. MMB conducted the primary statistical analysis, and KJK oversaw the implementation. MMB drafted the manuscript, which was critically reviewed and revised by KJK, TOC, SA, KJ, and TD. All authors helped refine the study protocol and approved the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: MMB and MVO received logistical and financial support from Cure SMA. Cure SMA provided a room for survey administration during the Annual Conference, funding for demonstration equipment, and a research assistant (MVO). CHCO's REDCap supported this project under NIH/NCATS Colorado CTSA Grant Number UL1 TR002535. Its contents are the authors’ sole responsibility and do not necessarily represent official NIH views.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: MMB serves on advisory boards and/or as a consultant for Biogen, Scholar Rock, WCG, Aspa Therapeutics, and CureSMA and receives grant funding from Dyne Therapeutics.
TOC serves on advisory boards for Biogen, Scholar Rock, Sarepta, Catalyst, Avexis, Cure SMA, and Muscular Dystrophy Association.
LB served as a speaker for Biogen and Sarepta.
KH served on an advisory board for Genentech-Roche.
NLK serves on Argenx, Astellas, Biogen, Catalyst, Genentech, Novartis, and Sarepta advisory boards. DSMB and consulting activities for Sarepta.
JN serves on an advisory board for Catalyst.
VR serves on advisory boards, speaking, and/or in a consultant role for Avexis, Biogen, Genentech-Roche, Scholar Rock, PTC therapeutics, NSPharma, Regenxbio, Sarepta therapeutics, France Foundation, CureSMA, and Muscular Dystrophy Association. VR now works for Ultragenyx following this work on this project.
TD serves on advisory boards and/or as a consultant for Scholar Rock, CureSMA, DuchenneUK, Roche, and Biogen. Consultant for Biogen, Dyne, Trinds, Roche, Genentech.
SA serves on advisory boards for Dyne, Sarepta, ITF, and PPMD.
KJK serves on advisory boards and/or in a consultant role for Biogen, Cure SMA, Scholar Rock, and Aspa Therapeutics and receives institutional grant support from Scholar Rock and Biogen via Ann & Robert H. Lurie Children's Hospital.
CK serves on an advisory board for Catalyst. AST serves as a consultant for Scholar Rock. JT serves as a consultant for ASPA Therapeutics.
KJ, TC, JM, JN, RP, CR, ASc, ST, MV, and KV have nothing to report.
Data availability statement
The data supporting the findings of this study are available upon reasonable request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
