Abstract
Background:
Patient-reported outcome measures (PROMs) are often used in clinical research, but little is known about their performance as longitudinal outcomes.
Methods:
We used data from ASCEND, a large SPMS trial (n = 889), to investigate changes on the Short Form Health Survey 36 (SF-36 v2) and the Multiple Sclerosis Impact Scale (MSIS-29) over 2 years of follow-up.
Results:
PROM scores changed little over the 2 years of follow-up. In contrast to physical disability measures, there was no consistent trend in PROM change: significant worsening occurred about as often as improvement. Using a 6-month confirmation reduced the number of both worsening and improvement events without altering their relative balance. There was no clear difference in worsening events in groups based on population characteristics, nor was there a noticeable effect using different thresholds for clinically significant change.
Conclusion:
We found little consistent change in MSIS-29 and SF-36 over 2 years of follow-up in people with SPMS. Our findings show a disconnect between disability worsening and PROM change in this population. Our findings raise caution about the use of these PROMs as primary outcome measures in SPMS trials and call for a critical reappraisal of the longitudinal use of these measures in SPMS trials.
Keywords
Introduction
Patient-reported outcome measures (PROMs) are frequently used to evaluate patients’ perspectives on disability, wellbeing, and the impact of disease. These constructs fall under the overarching concept of health-related quality of life (HRQOL). The two most commonly used measures of HRQOL in multiple sclerosis (MS) are the Medical Outcomes Study Short Form Health Survey (SF-36) 1 and the Multiple Sclerosis Impact Scale (MSIS-29). 2 The SF-36 investigates physical and social functioning using eight subscales and has been used in many diseases. The SF-36 can be summarised into two summary scores, the mental health component score (MCS) and the physical health component score (PCS). 3 The MSIS-29 was developed as an MS-specific PROM. The MSIS-29 is a 29-item questionnaire measuring the perceived impact of disability on activities of daily living and wellbeing. The MSIS-29 similarly can be summarised in two scores: the physical (MSIS-Physical) and psychological (MSIS-Psychological) summary score.
Both of these PROMs are often used as secondary outcomes in clinical trials, in value-based health care initiatives, and even in marketing authorisation procedures for investigated compounds in all forms of MS. Both instruments have good psychometric properties in a cross-sectional context, but their usefulness as longitudinal outcome measures in MS is not well established, especially across the spectrum of MS subtypes.
In this investigation, we used data from a large phase III randomised controlled trial in secondary-progressive multiple sclerosis (SPMS) to describe longitudinal change on these measures and to determine their usefulness as outcomes in the setting of a clinical trial. We investigated significant worsening and similarly defined improvement in these PROMs over 2 years of follow-up and compared unconfirmed and confirmed significant change. We also investigated how baseline factors such as sex, treatment arm, or disability at baseline impact worsening, and explored different threshold definitions for clinically significant change.
Methods
ASCEND data set
ASCEND was a randomised, double-blind, placebo-controlled, two-arm trial of natalizumab treatment in SPMS. 4 The inclusion criteria were age 18 to 58 years inclusive, SPMS for 2 or more years, disability progression over the previous year, a screening Expanded Disability Status Scale (EDSS) 5 score of 3.0 to 6.5 inclusive, and a Multiple Sclerosis Severity Score (MSSS) 6 of 4 or more. Patients with a clinical relapse in the 3 months before inclusion were excluded, as were patients with a timed-25-foot walk test (T25FW) 7 of more than 30 seconds. In ASCEND, SPMS was defined as relapsing-remitting disease followed by progressive disability independent of or not explained by MS relapses for at least 2 years prior to inclusion. The trial did not show a treatment benefit of natalizumab over placebo.
SF-36 and MSIS-29 assessments
In ASCEND, trial participants completed MSIS-29 and SF-36 (v2) questionnaires at baseline, and then at 24, 48, 72, and 96 weeks. For this analysis, we calculated MSIS-29 Physical and Psychological scores for each time point. MSIS-29 Psychological and Physical scores can range from 0 to 100, with higher scores indicating worse HRQOL. 2 We calculated SF-36 Physical Component Summary (PCS) and Mental Component Summary (MCS) scores for each of these time points. SF-36 PCS and MCS scores range from 0 to 100, with higher scores indicating better HRQOL.1,3 For the MSIS-29 Physical and Psychological scores, we defined significant worsening as an increase by 8 or more points compared to baseline.8,9 For the SF-36 PCS and MCS scores, we defined a 5-point or more decrease from baseline as significant worsening.10,11 To compare PROMs with physical disability worsening, we chose the physical disability outcomes EDSS and the T25FW. We defined significant worsening on the EDSS as an increase by one whole point if the comparator EDSS was 5.5 or lower, and by one-half point if the comparator EDSS was 6.0 or 6.5. We defined significant worsening on the T25FW as a 20% increase in the time needed to complete the T25FW (average of two trials). 12 We also explored the association of the baseline characteristics sex, EDSS at baseline, and trial arm with worsening on the PROMs during follow-up.
Statistical analysis
Average change in HRQOL over time
We calculated the change in PROM summary scores between baseline and each trial visit.
Proportion of patients with unconfirmed and confirmed clinically significant change
We calculated the percentage of patients with unconfirmed, 6 months, and 12 months confirmed significant change (improvement or worsening) in HRQOL at each visit compared to baseline. To explore changes in HRQOL occurring between later time points in the study, we calculated the percentage of patients with significant HRQOL change between baseline and 24 weeks, between 24 and 48 weeks, between 48 and 72 weeks, and between 72 and 96 weeks of follow-up. We used Student’s t-tests to compare the change in PROM summary scores between patients with and without significant worsening of the EDSS and T25FW in these same intervals.
Effect of different thresholds of the definition of significant change
To explore the importance of the definition of the threshold for significant change, we calculated the proportions of patients with HRQOL change at different cut-off scores. In addition to the generally used threshold of 8 points for the MSIS, we explored ‘any change’, ‘4 point’ and ‘16 point’ thresholds for MSIS Physical and MSIS Psychological scores. In addition to the generally used 5-point threshold for the SF-36, we explored ‘any change’, ‘2 point’ and ‘10 point’ thresholds for the SF-36 PCS and MCS scores.
Association of baseline characteristics and significant PROMs worsening
We used contingency tables and chi-square tests to investigate the associations of the baseline characteristics, sex, EDSS at baseline, and treatment arm with significant PROM worsening. We used the R statistical software package version 4.0.5 for all statistical analyses. 13 Statistical significance was understood to be at the two-tailed 0.05 level.
Data availability
The data used in this study is available upon request from Biogen. Individual participant data collected during the trial is shared after anonymization and on approval of a research proposal and data sharing agreement. Research proposals can be submitted online (www.biogenclinicaldatarequest.com).
Results
ASCEND data set
The ASCEND data set contained data on 889 patients. Table 1 shows their baseline characteristics.
Baseline clinical, imaging and HRQOL characteristics at screening in the ASCEND data set. Higher scores on MSIS-29 indicate worse HRQOL, higher scores on the SF-36 indicate better HRQOL.
SD: standard deviation; MSIS: Multiple Sclerosis Impact Scale; EDSS: Expanded Disability Status Scale; IQR: interquartile range; PCS: Physical Component Summary; MCS: Mental Component Summary; NHPT: Nine Hole Peg Test; SDMT: Symbol Digit Modalities Test.
Change in HRQOL summary scores
SF-36 and MSIS-29 summary scores changed little over the 2-year duration of this trial. The MSIS-29 physical at baseline was 50.8 (SD 20.2) compared to 50.5 (SD 23.3) at week 96. The MSIS-29 psychological score was 39.1 (SD 22.4) at baseline, compared to 36.7 (SD 23.9) at week 96. The SF-36 PCS was 33.3 (SD 7.9) at baseline, compared to 33.5 (SD 8.6) at week 96, and the SF-36 MCS was 47.0 (SD 10.6) at baseline, compared to 47.7 (10.7) at week 96. 14 There was little overall change in PROM summary scores from baseline throughout the trial (Figure 1). The median change in all summary scores was about equal in the positive and the negative direction and averaged approximately zero (Figure 1).

Change from baseline in the four HRQOL summary scores: MSIS Physical (a), MSIS Psychological (b), SF-36 PCS (c), and SF-36 MCS (d). There is overall little change in all four investigated summary scores over the 96 weeks of follow-up in ASCEND. PCS: Physical Component Summary, MCS: Mental Component Summary.
Significant PROM change over time
For the MSIS-29 physical, the percentage of patients with unconfirmed significant worsening increased slightly but steadily throughout the trial, from 26.9% at week 24% to 32.1% at week 96 (Table 2 and Figure 2). Worsening on the other PROM summary scores did not show a consistent pattern, and was quite stable over the course of follow-up. Remarkably, the proportion of participants with significant improvement on the PROM summary scores compared to baseline was similar or higher compared to those with significant worsening (Table 2). These findings on PROM worsening stand in contrast to worsening on the physical disability measures EDSS and T25FW, which showed a steady increase in worsening events (Table 2 and Figure 2).
Proportion of patients with significant change (worsening or improvement) in EDSS, T25FW and HRQOL measures over 2 years of follow-up compared to the baseline visit.
EDSS: Expanded Disability Status Scale; MSIS: Multiple Sclerosis Impact Scale; PCS: Physical Component Summary; MCS: Mental Component Summary; 6M: 6 months; 12M: 12 months; SD: standard deviation.
Higher scores on MSIS-29 indicate worse HRQOL, and higher scores on the SF-36 indicate better HRQOL.

Significant unconfirmed worsening and improvement on the EDSS (a), T25FW (b) and the four HRQOL summary scores: MSIS Physical (c), MSIS Psychological (dSF), -36 PCS (eSF), and -36 MCS (f) over the course of the trial compared to baseline. While the disability outcomes EDSS and T25FW show a steady increase in worsening events throughout follow-up, there is little change in and no consistent trend in the HRQOL measures. Throughout the trial, participants were at least as likely to report improvement as worsening in HRQOL.
Confirmed and unconfirmed significant PROM change over time
Using the concept of ‘confirmed change’, we substantially reduced the percentages of participants with significant worsening and improvement. Using 6 months confirmation, the number of participants with significant worsening on the MSIS Physical decreased from 26.9% (unconfirmed) to 15.8% (6 months confirmed) at week 24 (Table 2 and Figure 3). A similar change was seen for proportions of improvement; the proportion of significant improvement on the MSIS Physical decreased from 31.8% (unconfirmed) to 20.2% (6 months confirmed) at week 24. These meaningful reductions were also seen at later time points and were similar to those for other PROM summary scores. Using 12-month confirmation did not change these proportions substantially (Table 2). The physical outcomes of EDSS and T25FW were less affected by 6-month confirmation (Table 2 and Figure 2).

Unconfirmed and 6-month confirmed worsening on the EDSS (a), T25FW (b), and the four HRQOL summary scores: MSIS Physical (c), MSIS Psychological (d), SF-36 PCS (e), and SF-36 MCS (f) over the course of the trial. While 6-month confirmation always reduces the number of worsening events, this effect is much more pronounced in for the HRQOL summary scores. This argues for an increased variability of these measures over time.
Additional analysis: PROM worsening in later time intervals
In our analyses of the proportion of participants with significant worsening from baseline, we found a remarkably large jump in worsening events from baseline to 24 weeks, but little change after week 24 (Table 2, Figure 2). To study whether there is a specific significance to this early period in the trial, we also determined the proportion of patients with significant HRQOL change between 24 and 48, 48 and 72, and 72 and 96 weeks (Table 3). The proportion of patients with significant worsening or improvement was strikingly similar in each epoch, suggesting that the large ‘jump’ in both worsening and improvement events between baseline and week 24 is due to the variability of the measures, rather than a specific event in the early phase of the trial.
Proportion of patients with significant change (worsening or improvement) in HRQOL measures over 2 years of follow-up compared to the previous visit ( ‘rebaselined’).
MSIS: Multiple Sclerosis Impact Scale; PCS: Physical Component Summary, MCS: Mental Component Summary.
Higher scores on -29 indicate worse HRQOL, higher scores on the -36 indicate better HRQOL.
To investigate the association of change in PROMs and disability worsening in these intervals, we compared the change in summary PROM scores between participants with and without significant EDSS and T25FW worsening. Participants with significant EDSS and T25FW worsening generally had scores suggestive of worse HRQOL, although these differences only rarely reached statistical significance (Table 4).
Mean change in HRQOL measures by unconfirmed significant EDSS and T25FW worsening over 2 years of follow-up compared to the previous visit ( ‘rebaselined’). Participants with worsening physical disability generally had score changes suggestive of worse HRQOL. These differences occasionally reached statistical significance.
MSIS: Multiple Sclerosis Impact Scale; EDSS: Expanded Disability Status Scale; PCS: Physical Component Summary, MCS: Mental Component Summary.
Higher scores on MSIS-29 indicate worse HRQOL, higher scores on the SF-36 indicate better HRQOL.
Threshold definitions of significant PROM change
To explore the influence of the threshold definition for significant change on worsening and improvement percentages, we repeated the analysis using different threshold definitions for each of the PROMs (any, 4-, and 16-point change for MSIS-29, and any, 2-, and 10-point change for SF-36). Using different cut-offs for defining significant change did not change the overall balance between worsening versus improvement events. In general, similar proportions of patients worsened or improved using any of the explored thresholds (Table 5).
Worsening and improvement proportions with different thresholds for change on the PROMs.
MSIS: Multiple Sclerosis Impact Scale; PCS: Physical Component Summary, MCS: Mental Component Summary.
Association of baseline characteristics and significant PROM worsening
In our investigation of the association of PROM worsening with sex, EDSS at baseline and treatment arm, only female sex was significantly associated with worsening on the MSIS-psychological score (Table 6). We found no other significant associations.
Contingency tables of significant PROM worsening by baseline disability status, sex, and treatment arm.
EDSS: Expanded Disability Status Scale; NTZ: natalizumab; MSIS: Multiple Sclerosis Impact Scale; PCS: Physical Component Summary, MCS: Mental Component Summary.
Chi-square test.
Discussion
In this cohort of people with steadily worsening physical disability, PROM scores showed little consistent change. Our investigation showed that participants were roughly as likely to worsen on these measures as they were to improve over the course of 2 years of follow-up. The lack of change in these outcomes stands in contrast to the physical disability measures EDSS and T25FW, which show a steady increase in worsening events. These findings are somewhat unexpected since the MSIS-29 and SF-36 are well-validated scales of HRQOL in MS and reflect functional impairment in cross-sectional studies. The MSIS-29 has good test–retest reliability, 10 and shows convergent validity with the EDSS. 11 In a previous investigation of the ASCEND data set, we found an association of disability worsening, especially on the T25FW and EDSS, with MSIS-29 and SF-36. 14 Based on these data, we had expected PROM summary scores to steadily worsen, and that worsening events would occur more frequently than improvement events. We chose the T25FW and EDSS as comparators because they are reliable measures of physical disability worsening in SPMS based on previous studies,15,16 but it should be kept in mind that both of these measures rely on ambulation. It is possible that the investigated PROMs are a better measure of other physical or mental functional domains that are not well quantified by the T25FW and EDSS.
One explanation for the similar proportions of worsening and improvement events may be that the MSIS-29 and SF36 are simply not responsive enough to detect the clinical progression present in the specific population in this cohort. Much of the research on the psychometric properties of the MSIS-29 and SF-36 in MS comes from cross-sectional studies.2,17 –26 Such studies showed a significant association between the physical PROM subscores and disability. However, this does not prove the usefulness of these measures as longitudinal outcome measures in a clinical trial. There are only a few studies of the responsiveness of longitudinal measurements of the MSIS-29 and SF-36 in MS.9,17,19 Ideally, responsiveness is based on one gold standard measuring the same construct26,27 but often it is defined based on a change in a reference measure that represents clinical or performance status or global perceived effect after an intervention.27,28 For the MSIS-29 Physical subscore, the minimal clinically important difference (MCID) score of 8 was determined and validated using predefined significant worsening on the EDSS.8,9 To the best of our knowledge, no study has determined a specific MCID of the SF-36 in MS. Therefore, most often the value of one half of the standard deviation of a healthy standard population (which was 5 for the SF-36) is used.10,11 This approach is far from ideal since it is heavily dependent on the variance in the (healthy) reference population and may not reflect the changes MS patients find clinically relevant.
It is also unclear whether it is appropriate to use a single MCID definition for a heterogeneous disease such as MS. The sensitivity for detecting change can differ between disability strata or disease courses, and the population in which an MCID is used should always match the population in which the MCID was determined. For the MSIS-29, there is a different responsiveness in people with higher compared to lower disability: the MSIS-29 tends to perform better in higher disability strata (EDSS 5.5-8.5). 9 However, in patients with a higher EDSS, the response shift phenomenon introduces yet another type of variation, as patients with high EDSS scores tend to score better on the MSIS-29 measures over time based on a different appreciation of the impact of disability. 29 This effect results in improved scores in the absence of a change in functioning level. PROMs not only reflect physical limitations, but also psychological factors, resilience, and physical or psychological adaptations to changing physical abilities. Those who adapt to worsening physical limitations may retain similar scores on PROMs. Our investigation showed similar rates of worsening and improvement at a variety of thresholds, both lower and higher than those currently used. This suggests that a useful definition of significant change in the investigated PROMs may not exist.
Another aspect of the variation in PROMs can be seen in the comparison of unconfirmed and confirmed significant worsening. Most worsening events on an ideal outcome that measures ‘fixed’ change in HRQOL should persist over time, so that the difference between unconfirmed and confirmed worsening events should be small. In ASCEND, the number of events substantially decreased after introducing confirmation. A previous study in a community population of people with MS showed a large standard error of measurement with a relatively broad 95% confidence interval for individual MSIS-29 physical scores (SEM 5.0, 95% CI +/- 9.8). 17 This implies that individual variation may exceed the MCID, introducing important problems for longitudinal studies that depend on the MCID as a threshold for significant change. Indeed, an investigation in a clinical cohort of people with MS showed that conventional PROMs for HRQOL in MS, including the SF-36 and MSIS-29, correlate well with the EDSS and T25WF cross-sectionally, but correlations between longitudinal changes in disability measures and PROMs were low, suggesting low reliability to detect disability worsening. 30
Even though the main focus of this study was not to investigate factors contributing to HRQOL change, we did analyse some baseline factors that could have influenced the changes in PROM summary scores. Unfortunately, we were not able to include reliable measures of depression and fatigue, which often have an effect on HRQOL. 31 Analysis of other potential influencing factors such as treatment-arm, sex, and disability status showed that sex was associated with worsening in the MSIS-29 psychological subdomain. While the treatment arm was not associated with significant differences in the investigated PROMs in this study, natalizumab treatment was reported to have a positive effect on SF-36 summary scores in the AFFIRM and SENTINEL trials in relapsing-remitting MS. 32 It would be worthwhile to investigate longitudinal change in PROMs and the effect of treatment on HRQOL in relapsing-remitting MS cohorts.
A major strength of this study is its grounding in a clinical trial where systematic measurements and assessments were carried out. A potential limitation is the over 20% dropout rate that impacted both treatment arms. This may have dampened the changes in HRQOL if those who experienced the most changes systematically withdrew. While this could be an explanation for why people quit the study, the analyses of the incremental time points should have shown more robust changes if this were the case.
Taken together, the issues with responsiveness, and the lack of longitudinal correlations with physician and performance-based outcome measures across the different disability strata inherent in the investigated HRQOL-related PROMs, make changes on these measures difficult to interpret. Given our results, significant change in the SF-36 and MSIS-29 as currently defined should not be used to inform clinical decision making. The use of these PROMs as primary longitudinal outcome measures in SPMS clinical trials is untimely. Further research is necessary to determine which PROMs to use and how to define meaningful change on such measures in clinical trials in SPMS.
Footnotes
Author contributions
E.S. and M.K. conceptualised and designed the study, interpreted the results, wrote the first draft, reviewed and edited the manuscript for intellectual content. M.K. and E.S. performed the data analysis. All authors collaborated in interpreting the results, and in reviewing and editing the manuscript for intellectual content.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship and/or publication of this article: Dr Strijbis reports no disclosures. Dr Repovic received consulting and/or speaking honoraria from Alexion, Biogen, Celgene, Roche, Sanofi Genzyme, Viela and EMD Serono. Dr Mostert reports no disclosures. Dr Bowen received honoraria from serving on the scientific advisory board and speaker’s bureau of Biogen, Celgene, EMD Serono, Genentech and Novartis. He has received research support from AbbVie Inc, Alexion, Alkermes, Biogen, Celgene, Sanofi Genzyme, Genentech, Novartis and TG Therapeutics. Prof. Uitdehaag received consultancy fees and/or research support from Biogen, Sanofi Genzyme, EMD Serono, Novartis, Roche and Teva. Prof. Cutter served on data and safety monitoring Boards for Avexis Pharmaceuticals, Biolinerx, Brainstorm Cell Therapeutics, CSL Behring, Galmed Pharmaceuticals, Horizon Pharmaceuticals, Hisun Pharmaceuticals, Merck, Merck/Pfizer, Opko Biologics, Neurim, Novartis, Ophazyme, Sanofi-Aventis, Reata Pharmaceuticals, Receptos/Celgene, Teva pharmaceuticals, Vivus, NHLBI (Protocol Review Committee), NICHD (OPRU oversight committee). He participated in and received fees for consulting or advisory boards for Biogen, Click Therapeutics, Genzyme, Genentech, Gilgamesh Pharmaceuticals, GW Pharmaceuticals, Klein-Buendel Incorporated, Medimmune, Medday, Novartis, Osmotica Pharmaceuticals, Perception Neurosciences, Recursion Pharmaceuticals, Roche, Somahlution, TG Therapeutics. Prof. Cutter is employed by the University of Alabama at Birmingham and President of Pythagoras, Inc. a private consulting company located in Birmingham, Alabama, USA. Dr Koch received consulting fees and travel support from Biogen, Novartis, Roche, Sanofi Genzyme and EMD Serono.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
