Abstract
Motor dysfunction, particularly ataxia, is one of the predominant clinical manifestations in patients with multiple sclerosis (MS). Assessment of motor dysfunction suffers from a high variability. We investigated whether the clinical rating of ataxia can be improved through the use of reference videos, covering the spectrum of severity degrees as defined in the Neurostatus-Expanded Disability Status Scale. Twenty-five neurologists participated. The variability of their assessments was significantly lower when reference videos were used (SD = 0.12; range = 0.40 vs SD = 0.26; range = 0.88 without reference videos; p = 0.013). Reference videos reduced the variability of clinical assessments and may be useful tools to improve the precision and consistency in the clinical assessment of motor functions in MS.
Introduction
In multiple sclerosis (MS) clinical assessment scales—mainly the Expanded Disability Status Scale (EDSS)—are used to quantify impairment and disability. The EDSS is known for a low inter- and intrarater reliability and suffers from a high variability, especially at lower EDSS scores. 1 Motor dysfunction and particularly ataxia is one of the predominant clinical manifestations in patients with MS and a major contributor to disability progression. 2 Thus, reliable and consistent rating of ataxia is crucial for the follow-up of patients with MS.
Objective
The objective of this report is to investigate whether reference videos (RVs) exemplifying degrees of ataxia severity can reduce the variability of motor dysfunction assessment in MS.
Methods
Study design and participants
This study was a subproject of “Assess MS,” 3 a study approved by the local ethics committees. All patients gave their written informed consent to the video recordings. Twenty-five raters (neurologists) from the university hospitals in Bern and Basel rated 60 videos based on 43 MS patients performing the finger-to-nose test (FNT). The videos were recorded with a Microsoft Kinect™ 1 camera and chosen out of >2000 videos recorded for the Assess MS study, with the constraint to have coverage for all limb ataxia grades of the Neurostatus-EDSS definitions.4,5 According to these definitions there are five grades of limb ataxia: 0 = no ataxia, 1 = signs only, 2 = tremor or clumsy movements easily seen, minor interference with function, 3 = tremor or clumsy movements interfere with function in all spheres and 4 = most functions are very difficult. The ratings were performed at baseline and six weeks later (“retest”), to assess the long-term intrarater agreement. In both rating sessions 10% of the videos were presented twice for short-term intrarater agreement.
Forty-one RVs, different from the videos used for rating, were chosen by experienced neurologists of the Assess MS study. They also showed MS patients performing the FNT, with different degrees of limb-ataxia severity, based on the Neurostatus-EDSS definitions.4,5 The raters were randomized into two groups: one group assessing videos based only on the written Neurostatus-EDSS definitions, 5 without simultaneous access to the RVs (Setting 1), and the other, with simultaneous access to the RVs (Setting 2). The characteristics of the raters are summarized in Table 1. There was no difference in experience with MS patients between the groups (Setting 1 vs Setting 2).
Characteristics of patients and neurologists participating in this study.
EDSS: Expanded Disability Status Scale; MS: multiple sclerosis; PPMS: primary progressive multiple sclerosis; RRMS: relapsing–remitting multiple sclerosis; SPMS: secondary progressive multiple sclerosis.
Patient performance
For FNT videos, MS patients were instructed by the recording neurologists of the Assess MS study to close their eyes and abduct their arms to 90 degrees at the shoulder in full extension, before touching the nose with the tip of their index finger, first with the dominant, then with the nondominant side (Figure 1).

Reference videos on the right, videos to be rated on the left, below fields for scoring the appropriate severity of the performance using ataxia grades of the Neurostatus-Expanded Disability Status Scale definitions. According to these definitions there are five grades of limb ataxia: 0 = no ataxia, 1 = signs only, 2 = tremor or clumsy movements easily seen, minor interference with function, 3 = tremor or clumsy movements interfere with function in all spheres and 4 = most functions are very difficult. People shown are not patients and gave written consent to be shown.
Video rating
Videos were presented for rating on a touchscreen. Setting 2 allowed for simultaneous presentation of RVs on the right part of the screen (Figure 1). Horizontal swipe allowed for viewing RVs of different limb-ataxia severity degrees; vertical swipe for viewing alternative RVs of the same severity degree. In Setting 1 this part of the screen remained black. Raters were allowed to view each video as often as required for scoring.
Statistics
The analysis was conducted using Matlab R2014b (MathWorks, Natick, MA, USA). F test was used to compare the variability of the ratings between the two rater groups (Setting 1 vs Setting 2). Interrater agreement was calculated as intraclass correlation coefficient (ICC) for single measurements and absolute agreement. 6 Intrarater agreement was calculated as the percentage of identical ratings.
Results
The variability of ratings was significantly lower in Setting 2 (standard deviation (SD) = 0.12; range = 0.40) than in Setting 1 (SD = 0.26; range = 0.88, F test; p = 0.013), as illustrated in Figure 2. The ICC for interrater agreement was numerically slightly higher in Setting 2 (0.816 (95% confidence interval (CI): 0.756–0.871) vs 0.756 (95% CI: 0.674–0.829) in Setting 1) but this difference was not significant. Short-term and long-term intrarater agreement were similar across settings (Setting 1: 79±18% and 69±11%; Setting 2: 75±22% and 68±9%, not significant).

On the left the ratings of Setting 1, i.e. the group without reference videos are shown without ((w/o) ref) and on the right, those from Setting 2, i.e. the group with reference videos (“with ref”). Mean and standard deviation (SD) are shown in green, median in magenta. The variability of ratings was significantly lower in Setting 2 (SD = 0.12; range = 0.40) than in Setting 1, w/o) reference videos (SD = 0.26; range = 0.88, F test; p = 0.013). Each dot represents the average of all ratings of one neurologist (blue at baseline and red six weeks later).
The average score of limb ataxia (according to the Neurostatus-EDSS definitions) was slightly higher in Setting 2, with RVs (mean score (test and retest after six weeks): 1.4 ± 0.1 in Setting 2, vs 1 ± 0.3 in Setting 1, p < 0.0001), as illustrated in Figure 2. No significant interaction was found between intrarater agreement, raters’ experience with MS or EDSS assessments, or the centers (data not shown).
Discussion
As “pars pro toto,” the results of this study show that using preselected RVs can reduce the rating variability in the assessment of limb ataxia of MS patients. The use of such videos can be easily implemented and does not require an additional/new scale, since we used the already well-established Neurostatus-EDSS definitions. 4 Whether this approach can also be used for assessments other than limb ataxia remains to be shown.
We found a small but statistically significant difference of the average severity level obtained in the two settings with higher ratings in the setting with RVs. As the ataxia degrees were assigned to the RVs by neurologists with special expertise in clinical ratings, this may have contributed to stricter interpretation of the grade definitions. A further limitation in our study was the low number of severely affected patients (ataxia grades 3 and 4). In daily routine, however, rating of lower-severity grades is more challenging than higher grades. Our RV approach may also have a role in training machine-learning algorithms (MLAs). Such an example is the Assess MS system, a potentially finer-grained tool to measure motor dysfunction in MS. 3 This system uses advanced MLAs to analyze three-dimensional-depth-sensor recordings of MS patients performing standard tests of motor function, like the FNT. Reducing the variability of clinical assessments that are used to train MLAs should also contribute to improved algorithms that are derived from machine learning.
Conclusions
The use of RVs may represent a simple method to reduce variability in the assessment of motor dysfunction in MS. This method could be particularly useful in the context of clinical research, and to train MLAs.
Supplemental Material
Supplemental material for Reference videos reduce variability of motor dysfunction assessments in multiple sclerosis
Supplemental material for Reference videos reduce variability of motor dysfunction assessments in multiple sclerosis by Marcus D’Souza, Saskia Steinheimer, Jonas Dorn, Cecily Morrison, Jacques Boisvert, Kristina Kravalis, Jessica Burggraaff, Caspar EP van Munster, Manuela Diederich, Abigail Sellen, Christian P Kamm, Frank Dahlke, Bernard MJ Uitdehaag and Ludwig Kappos in Multiple Sclerosis Journal – Experimental, Translational and Clinical
Footnotes
Conflict of Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: M.D.S. has received travel support from Bayer AG, Teva Pharmaceuticals, and Sanofi Genzyme and research support from the University Hospital Basel. S.S. has received travel support from Bayer, Merck and Novartis and honoraria for consulting from Bayer, Merck, Roche, and Teva. J.D. is an employee of Novartis Pharma AG. C.M. is an employee of Microsoft research. J.Bo. has nothing to declare. K.K. has nothing to declare. J.Bu. has received travel support from Novartis Pharma AG. C.E.P.V.M. has received travel support from Novartis Pharma AG, Sanofi Genzyme, and Teva Pharmaceuticals, and honoraria for lecturing and consulting from Biogen-Idec and Merck Serono. M.D. has nothing to declare. A.S. is an employee of Microsoft research. C.P.K. has received honoraria for lectures as well as research support from Biogen-Idec, Novartis Pharma AG, Almirall, Bayer Schweiz AG, Teva Pharmaceuticals, Merck Serono, Sanofi Genzyme, and the Swiss MS Society. F.D. is an employee of Novartis Pharma AG. B.M.J. Uitdehaag has received consultation fees from Biogen-Idec, Novartis Pharma AG, EMD Serono, Teva Pharmaceuticals, Sanofi Genzyme, and Roche. The Multiple Sclerosis Centre Amsterdam has received financial support for research from Biogen-Idec, Merck Serono, Novartis Pharma AG, and Teva Pharmaceuticals. L.K.’s institution (University Hospital Basel) received in the last three years and used exclusively for research support at the Department of Neurology steering committee, advisory board and consultancy fees from Actelion, Alkermes, Almirall, Bayer, Biogen, df-mp, Excemed, GeNeuro SA, Genzyme, Merck, Minoryx, Mitsubishi Pharma, Novartis, Receptos, Roche, Sanofi-Aventis, Santhera, Teva, Vianex, and royalties from Neurostatus products. For educational activities of the department, the institution received honoraria from Allergan, Almirall, Bayer, Biogen, Excemed, Genzyme, Merck, Novartis, Pfizer, Sanofi-Aventis, Teva, and UCB.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Novartis (PO 3900002757).
Supplemental material
Supplemental material is available for this article online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
