Abstract
Background:
The MSIS-29 measures the physical and psychological impact of MS.
Objective:
The associations between MSIS-29 domains and demographic/clinical aspects were examined and trajectories analysed over time.
Methods:
Data were collected in the Trajectories of Outcome in Neurological Conditions study for a diverse population of people with MS, with follow-up for up to 5 years. Following Rasch analysis, minimal important change (MIC) was computed for ensuing total, physical and psychological domains.
Results:
Fit to the Rasch model using data from 5921 participants validated physical, psychological and total domains, and the conversion table transforms raw scores to interval-level metric equivalents. These domains showed significant differences across demographic (age, gender, employment, education, and marital status) and clinical (subtype, treatment, and duration) factors with large effect sizes. The MIC scores were physical: 9.1, total: 14.1, which were both above measurement error, and psychological: 5.5 which was not, so 1.6% of participants reported psychological change which was clinically important but not statistically significant. Trajectory analysis showed three groups, one stable and two with significant slopes, improving and deteriorating.
Conclusion:
The MSIS-29 has shown adequate fit to the Rasch model after accommodating problems with local item dependency, through a bi-factor solution. The domains showed good discrimination across key factors.
Keywords
Introduction
Multiple sclerosis (MS) is a complex disease whose many symptoms impact upon disability, mood and quality of life. 1 Almost 42% of participants report reduced ability to perform daily activities, as well as negative effect on emotional and social factors. 2
It follows that physical and psychological functioning are two important traits to be considered in people with MS (pwMS). The MSIS-29 is a self-administered measure with 20 items covering physical aspects and 9 items covering psychological aspects, 3 which is reported to have high test–retest reliability and internal consistency.
This study examines the MSIS-29 in a large cohort of pwMS, looking at construct validity using the Rasch model, and other aspects such as minimal important change (MIC). It examines the association between the domains and key demographic and clinical aspects, the converted interval-level metrics are then used to explore the trajectory of domains over time.
Methods
Main sample
Participants were recruited into the Trajectories of Outcome in Neurological Conditions-MS (TONiC-MS) study https://www.finders-study.org/tonic where eligibility criteria included adults with MS (by revised McDonald criteria 4 ) of any subtype and disability level.
Disease subtypes at study entry were classified as relapsing remitting (RR), primary progressive (PP) or secondary progressive (SP). Duration since diagnosis and Expanded Disability Status Scale (EDSS) band were recorded from medical records. 5 Disease-modifying therapies (DMT) were categorised as low or high efficacy. 6 Written informed consent was obtained from all participants prior to enrolment. Ethical approval was granted from research committees (reference 11/NW/0743).
Longitudinal sample
Further questionnaire packs were sent at approximately 9-month intervals. At each follow-up, as well as repeating the questionnaire pack, respondents were asked to comment whether their disability and worry levels were worse, the same, or better compared to when they last completed a pack.
Calibration sample
Construct validity was examined using the Rasch measurement model. 7 To facilitate the analysis, a sample of 1000 was drawn from the full sample’s first three time points, and further randomised into two sub-samples of 500 for training and validation analyses. No individual was included more than once in the sample. 8 The sample size of each sub-sample was consistent with retaining a Type 1 error rate of 5% using the RUMM2030 software.9,10
Outcome measures
Several patient-reported outcome measures (PROMs) were included in the pack in addition to the change scores on disability and worry. The questionnaires relevant to the current investigation are:
MS Impact Scale (MSIS-29) – The 29 items in MSIS-29 (v1) measure impact in five levels (not at all, a little, moderately, quite a bit, extremely) where respondents are asked to record ‘the impact of MS on your day-to-day life during the past 2 weeks’. Total score ranges 0–116, physical score ranges 0–80, and psychological score ranges 0–36. Higher scores indicate greater impact.
Hospital Anxiety and Depression Scale (HADS) – Two subscales measuring anxiety and depression have associated clinical cut points delivering none-possible-probable caseness. 11
Statistical analysis plan
An overview of the application of the Rasch model is given in Tennant and Kucukdeveci, 12 details in Supplemental File 1 for this analysis. Differential item functioning (DIF) refers to items that function differently between groups of participants: although the participants have the same level of the factor being measured, they answer the scale item differently. 13
One way analysis of variance is applied to examine the discrimination across EDSS (for physical) and HADS caseness (for psychological). Should a total score be derived, this will be tested against EDSS.
The standard error of measurement (SEM) and the smallest detectable difference (SDD) of the MSIS-29 domains are calculated from baseline data. The minimal detectable change (MDC) is the minimum change in score for an individual that must occur to be sure that the change is not just due to measurement error. 14 The MIC reflects the smallest change in score that pwMS perceive as meaningful. 15 The MDC and MIC are determined from longitudinal data. The MIC used an anchor-based method, based on the patients’ perceived change of disability for the physical domain and of worry for the psychological domain. 16 It was calculated as the largest of the upper (when positive) or lowest (when negative) 95% confidence interval for the mean differences between before and after scores in the two groups rated as either ‘worse’ or ‘better’ by the respondents.17,18 For the change variables, the MDC and MIC can be combined to produce a four-fold classification. 19 This will identify groups where change was (1) not statistically significant or important (<MDC < MIC); (2) significant but not important (>MDC < MIC); (3) important but not significant (>MIC < MDC); and (4) both significant and important (>MDC > MIC). Effect sizes of the various estimates are reported. All values are calculated on the interval metric.
Using the metric transformation of each domain, a group-based trajectory model (GBTM) is applied in the full data set to ascertain if there were groups displaying different trajectories over time. 20 Details of GBTM methods are in Supplemental File 1.
Results
Sample descriptions
Cross-sectional data
Mean age at baseline in the full sample of 5921 pwMS was 50.2 years (SD 12.0), mean duration of MS was 11.1 years (SD 9.8), 73.8% were female. 66% were RR subtype, 22.9% were SP, and 11.2% PP. Over half (51.3%) were EDSS 4 or below (independently ambulant); 37% were EDSS 4.5–6.5, 11.4% were EDSS 7–9.5, and 0.25% unknown. There was a significant difference in EDSS level by disease subtype with, for example, EDSS 0–4 ranging from 8.8% in SP, to 71.4% in RR (Chi-square χ2 2.1e + 03(9); p < 0.001). Information on DMT use was available for 5633 (95.1%) of participants. Overall, 44.2% were on DMT, including 59.7% of those with RR, 15.3% of SP and 3.6% of PP. The most widely used DMT was an interferon (see Table 1). Within RR, 39.2% were on low efficacy DMT, 20.5% high efficacy. 6
Tabulation of disease-modifying therapies at consent.
DMT: disease-modifying therapies.
Longitudinal data
Data from 2416 pwMS who had at least completed their baseline and first follow-up questionnaires were analysed. Mean time from baseline to first follow-up was 22.6 months (SD 13.2), median time 19.6 months (interquartile range [IQR]: 10.7–31.4). Mean age was 50.7 years (SD: 11.5); MS duration 11.0 years (SD: 9.8); 65.0% RR, 23.1% SP and 11.9% were PP. There were no significant differences in MS subtype between those followed-up and the remainder of the full sample (χ2 3.07, df(2); p = 0.215). Around 75.1% were female, and 45.6% were on DMT.
Calibration sample
The calibration sample displayed no significant difference to those remaining in the full sample across age group, gender, MS subtype, duration group, DMT, or EDSS levels (χ2 > 0.05).
Fit to the Rasch model
Fit of physical and psychological domains to the Rasch model were examined in the calibration sample. Full details are given in Supplemental File 2. Briefly, the person-item (threshold) distribution of the 20 physical items in the training sample is shown in Figure 1. The scale is reasonably well targeted, although weak at the lower end of physical disability, with a floor effect shown between −4.2 and −5.2 logits. Item transition from ‘Not at all’ to ‘A little’ (threshold 1) is mostly observed at the lower impact level, while the transition from ‘Quite a bit’ to ‘Extremely’ (threshold 4) is at the high impact end. The item in which movement away from ‘Not at all bothered by’ is most easily achieved was ‘Do physically demanding tasks’. In contrast, the item ‘Difficulty moving about indoors’ was the least likely to transfer from ‘Quite a bit’ to ‘Extremely bothered’. There is no particular item order across this range of measurement.

Person-item threshold distribution of physical domain in training sample. Showing distribution of person estimates (above the x-axis) and item threshold estimates (below the x-axis).
There were several disordered thresholds associated with the transition from ‘A little’ to ‘Moderately’. There were breaches of local item independence which had to be accommodated, for example ‘I worry about how I will cope with the future’ and ‘Despite my difficulties I still manage to cope with daily life’ had a residual correlation of 0.483. Following this, fit to the model was achieved using a testlet approach, with the first 10 items grouped as ‘physical’ and the remaining 10 items grouped as ‘participation’ (e.g. ‘Limitations in your social and leisure activities at home’). The result was replicated in the validation sample.
In the psychological domain, fit to the model was poor. While the person-item distribution was adequate, three pairs of items were locally dependent. With an average residual correlation of −0.11, the pair of items ‘Feeling mentally fatigued’ and ‘Problems concentrating’ displayed a residual correlation of 0.182. ‘Feeling mentally fatigued’ was also the easiest item regarding moving away from ‘Not at all bothered’. The item ‘Feeling depressed’ was the one where the transition from ‘Quite a bit’ to ‘Extremely’ was the most difficult to achieve. Disordered thresholds were present in six out of nine items with the transition between ‘A little’ to ‘Moderately’ the source of the problem. Clustering local dependent items into ‘super items’ (i.e. post hoc following LD analysis) achieved fit.
Examining whether a total score from all 29 items was viable, fit was poor in the training sample. Principal component analysis of the residuals split the item set by domain, resulting in 33.6% of t-tests < 5%. Inspection of the item set and the pattern of local dependency suggested there were item clusters which were conceptually linked (e.g. items 1–4 physical; items 25–29 mood). Grouping these sets into two testlets, each combining sets of physical and psychological items, resulted in good fit to the model where just 3% of the variance needed to be discarded (Supplemental file 2, Table S1). Of note, the distribution of items and persons for the total score was more inclusive of the range of impact.
In the validation sample, the results were replicated other than DIF appeared for subtype, age, and duration. As those with SP differed in that they tend to be older and with longer duration, subtype was split for SP and the person estimates derived from the unsplit and split solutions compared. The p value of the paired t-test of the difference was 0.1043, so the unsplit solution was retained. The DIF for age and duration was no longer evident after subtype was split.
Conversion of raw scores to interval metric for all three domains is given in Table 2.
Conversion table to convert raw scores to interval-level metric for MSIS-29 total, physical and psychological domains.
Instructions for use of the conversion table for MSIS-29 v1.
Providing the respondent has answered all the items, take the raw score and look across to the interval scale estimate for the relevant domain.
For example, if you are converting the total score, a raw score of 100 would give a standardised metric total score of 84.6.
A raw physical score of 40 gives a standardised metric of 42.4.
A raw psychological score of 35 gives a standardised metric of 32.9.
Descriptives, discrimination and detection
The parameter estimates for the three domains, physical, psychological, and total, were exported into the main data set for analysis. Metric domain levels for demographic and clinical characteristics are shown in Table 3. Most domains displayed significant differences across demographic and clinical factors. However, with such a large sample, statistical significance was often generated where the actual difference was small, for example, effect size of the significant difference between those married/cohabiting, or not, on physical domain was 0.10 and psychological domain 0.16, both considered trivial. In contrast, the total score across the age gradient has an effect size of 0.62, considered medium. Difference in physical function between high and low DMT has an effect size of just 0.22, considered (very) small. Difference of level of physical functioning of those on high-efficacy DMT, and those not on any DMT, was 0.24.
Descriptive statistics of the metric MSIS-29 in the baseline sample: physical, psychological and total domains. N = 5795.
SD: standard deviation; PP: primary progressive; RR: relapsing remitting; SP: secondary progressive; DMT: disease-modifying therapy; HND: higher national diploma.
The discriminant validity (effect size) of the three domains is shown for relevant comparator measures in Table 4, strong significant gradients were found for every domain. MDC and MIC of the various domains are also shown.
Discriminant ability (effect size) of the MSIS-29 at baseline together with MDC and MIC from first follow-up.
EDSS: Expanded Disability Status Scale; HADS: Hospital Anxiety and Depression Scale; SEM: standard error of measurement; SDD: smallest detectable difference; %OR: percent of operational range of scale; SEMc: standard error of measurement of change score; MDC: minimal detectable change; MIC: minimal important change.
Confidence intervals of the worse and better estimates used to generate the MIC.
In the longitudinal data, 41.2% reported their disability had worsened, whereas 53.3% reported that it had stayed the same, 5.5% reported improvement. Worry was the same for 66.9%, worse for 24.7% and improved for 8.5%. Using the metric transformation of the various domains, Table 5 shows the distribution of the MDC and MIC. Both the physical and total scales can fully identify the MIC, but for the psychological scale, there were 1.6% respondents where the change was important but could not be distinguished from measurement error.
Pattern of important and significant changes in MSIS-29 based upon the Minimal Detectable Change and Minimal Important Change.
Trajectory analysis
Three groups were identified meeting the criteria specified in Supplemental File 1. Following physical and psychological aspects over 5 years, both physical and total domains showed small numbers (group 1: 7.4% for physical trajectories and 11.4% for total trajectories) with a low level of functioning which slightly improved (Figure 2(a) and (c)). In the physical domain, group 3 (66.7%) had a significant worsening over time. In the total domain, group 2 (28.9%) showed a significant worsening while 59.7% showed no significant increase over the follow-up. There was no significant movement in the three groups identified in the psychological domain (Figure 2(b)).

(a) MSIS-29 physical trajectories, (b) MSIS-29 psychological trajectories and (c) MSIS-29 total trajectories.
Discussion
This study supports the construct validity of the MSIS-29 through fit to the Rasch measurement model, having accommodated local item dependencies. The physical and psychological domains were confirmed, though there was a limitation in the physical domain at the lower end of the scale, as observed previously. 21 A total domain was also identified, which showed no limitation across the full impact experienced by pwMS.
The domains showed strong discrimination across the comparator measures and most clinical and demographic factors, although effect sizes were often trivial. All three domains showed the ability to identify the MIC, albeit with a small proportion of the psychological domain being undifferentiated from measurement error. The MIC for the physical domain is 9.1 when appropriately calculated on the metric. Earlier work using receiver operating characteristic curves for EDSS range 5.5–8 found an MIC of 8. 22 In the trajectory analysis, both the physical and total domains showed a small group improving over time and a much larger group who worsened. The psychological groups remained stable over the 5 years follow-up.
The MSIS-29 demonstrated some problems, notably the disordered thresholds in many of the items, also identified in previous studies applying the Rasch model to MSIS-29 data.23,24 However, there is inconsistency in these reports with a community sample reporting disordered thresholds and lack of evidence for a total score, 25 while a clinical trial sample reported fully ordered thresholds. 21 A further clinical trial reported ordered thresholds and suggested that the scale could be restructured into three domains, effectively splitting the physical domain into ‘symptoms’ and ‘general limitations’ item sets. 24 This study split the physical item set into ‘physical’ and ‘participation’ groups, based on the conceptual basis of the International Classification of Functioning, Disability and Health (ICF). 26 The ‘physical’ items are a mix of impairments, or physical symptoms, and activity limitations. The difference in this study is that by applying the bi-factor structure, these two item sets worked as a single domain with little loss of variance. Previous work suggested that the range of impact covered by physical domain items did not match the patient range of impact, particularly for patients with lower impact.21,24 This study supported that finding for the physical scale but showed no such shortfall for the measurement range for the total scale.
Our findings and earlier studies suggest that item thresholds from RR clinical trials work as intended, but not in community studies with varied subtypes. This raises the question as to whether the domains are invariant by MS subtype. In this study, while there was some lack of invariance, mostly driven by the SP subtype, these did not prove to be substantial. However, these analyses were run on testlets where some invariance may have been accommodated. 27
This study shows that the MSIS-29 is suitable for both clinical and epidemiological use providing the raw scores are transformed to the interval-level metric using Table 2. This is important as the local dependency and associated multidimensionality of the MSIS-29 has been resolved by the bi-factor solutions underlying Table 2. The physical and total scores show little floor or ceiling effect, and their MIC are well above the MDC or ‘noise’ of the scale at 9.1 (MDC 4.8) and 14.1 (MDC 7.6), respectively. Thus for these domains, all changes considered important by pwMS should be detectable. The physical domain requires just 6% and the total score 6.6% of operational range before error is overcome. Furthermore, the trajectory analysis indicates that the physical and total scores can track groups of pwMS who are stable, worsen or improve. These characteristics demonstrate the value of the interval metrics for epidemiological research and clinical care. Clinicians and researchers require to be able to detect clinically relevant change and distinguish between real change and measurement error. Knowing what level of change is significant for patients is critical for interpreting the impact of an intervention producing change.
In contrast, the psychological domain had an MIC at 5.5 which was less than the MDC (5.7) and thus meaningful change to the pwMS can occur within measurement error; our data showed a small number of pwMS reporting important change which would be undetected as falling within the MDC. The MDC was 15.8% of the operational range so many changes might have to be disregarded as falling within measurement error. In research, a much larger sample would be required to show change on the MSIS-29 psychological scale. The trajectory analysis using the psychological domain could detect distinct groups of pwMS who entered the study with very different levels of psychological impact from their MS, half showing some impact and about 40% more severely impacted. About 10% had little psychological impact. These three groups remained at stable levels of impact.
Future work should address whether the MIC is stable across different subtypes of MS, particularly for the psychological domain. 28 Any factors which predispose pwMS to fall into the improving group for MSIS-29 merit investigation as they may suggest improvements to clinical care. The stability of trajectories of the psychological domain requires more investigation.
In conclusion, the total, physical and psychological scores for the MSIS-29 can easily be converted to interval-level measurement, thus permitting parametric analyses such as change scores and trajectory examination. Clinicians and researchers may be confident in the measurement precision and discriminant ability of the MSIS-29, both in its subscales and total score. Clinically important change (MIC) for the physical and total scores is low and above measurement error (MDC), but the psychological score is less robust. The impact of MS varies within a large population of pwMS with different subgroups showing physical and total domain trajectories which remain stable, worsen or sometimes improve.
Supplemental Material
sj-pdf-1-msj-10.1177_13524585241288393 – Supplemental material for Physical and psychological aspects of multiple sclerosis: Revisiting the Multiple Sclerosis Impact Scale (MSIS-29)
Supplemental material, sj-pdf-1-msj-10.1177_13524585241288393 for Physical and psychological aspects of multiple sclerosis: Revisiting the Multiple Sclerosis Impact Scale (MSIS-29) by Carolyn A Young, David J Rog, Basil Sharrack, Radu Tanasescu, Seema Kalra, Suresh K Chhetri, Lisa Wilde, Roger J Mills and Alan Tennant in Multiple Sclerosis Journal
Footnotes
Acknowledgements
The authors thank the participants and their families for their invaluable contributions, the research and clinical staff for recruitment, and the TONiC team. Participating Investigators of UK Trajectories of Outcome in Neurological Conditions-MS Study Group: Prof Carolyn Young, Walton Centre NHS Trust, Liverpool; Dr David Rog, Northern Care Alliance NHS Foundation Trust, Salford; Prof Basil Sharrack, Sheffield Teaching Hospitals NHS Foundation Trust; Dr Radu Tanasescu, Nottingham University Hospitals NHS Trust; Dr Seema Kalra, University Hospital of North Midlands NHS Trust; Dr Suresh Chhetri, Lancashire Teaching Hospitals NHS Foundation Trust; Ms Helen Santander, Brighton and Sussex University Hospitals NHS Trust; Dr Tim Harrower, University of Exeter Medical School; Dr Oliver Leach, Royal Cornwall Hospitals NHS Trust; Dr Richard Nicholas, Charing Cross; Prof Helen Ford, Leeds Teaching Hospital NHS Trust; Dr John Woolmore, University Hospitals Birmingham NHS Foundation Trust; Dr Chris Kipps, Royal Hampshire County Hospital; Dr Clare Johnston, The York Hospital; Dr John Thorpe, North West Anglia NHS Foundation Trust; Dr David Paling, Sheffield Teaching Hospitals NHS Foundation Trust; Dr Mazen Matar, University Hospitals Leicester NHS Trust; Dr Cathy Ellis, Darent Valley Hospital; Dr Ashwin Pinto, University Hospital Southampton NHS Foundation Trust; Prof C Oliver Hanemann, Peninsula Medical School; Prof Siddharthan Chandran, University of Edinburgh; Prof Andrea Malaspina, Barts and the London School of Medicine; Dr Jo Kitley, Portsmouth Hospitals NHS Trust; Prof Jacqueline Palace, University of Oxford; Ms Tracy Fuller, Queen Elizabeth Hospital NHS Foundation Trust, Norfolk; Dr Pat Mottram, Countess of Chester Hospital; Ms Helen Terrett, Southport and Ormskirk NHS Trust; Dr Antonio Scalfari, London North West University Healthcare NHS Trust.
Data availability statement
Data supporting this study are not openly available due to reasons of sensitivity and are available from the corresponding author upon reasonable request. Data are located in controlled access data storage at Walton Centre NHS Trust.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Multiple Sclerosis Society (grant no. C009-16.1), Biogen Idec, Genzyme, Merck Serono, Novartis, Roche, Teva and Neurological Disability Fund 4530. The authors also thank the NIHR CLRN for research support. Radu Tanasescu received support from MRC (grant no. CARP MR/T024402/1). Suresh K.Chhetri is supported by the NIHR Lancashire Clinical Research Facility at Lancashire Teaching Hospitals NHS Foundation Trust. None of the funding sources had any role in the design of this study nor any role in its execution, analyses, interpretation of the data, or decision to submit results.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
