Abstract
Background
ABILHAND, a manual ability patient-reported outcome instrument originally developed for stroke patients, has been used in multiple sclerosis clinical trials; however, psychometric analyses indicated the measure’s limited measurement range and precision in higher-functioning multiple sclerosis patients.
Objective
The purpose of this study was to identify candidate items to expand the measurement range of the ABILHAND-56, thus improving its ability to detect differences in manual ability in higher-functioning multiple sclerosis patients.
Methods
A step-wise mixed methods design strategy was used, comprising two waves of patient interviews, a combination of qualitative (concept elicitation and cognitive debriefing) and quantitative (Rasch measurement theory) analytic techniques, and consultation interviews with three clinical neurologists specializing in multiple sclerosis.
Results
Original ABILHAND was well understood in this context of use. Eighty-two new manual ability concepts were identified. Draft supplementary items were generated and refined with patient and neurologist input. Rasch measurement theory psychometric analysis indicated supplementary items improved targeting to higher-functioning multiple sclerosis patients and measurement precision. The final pool of Early Multiple Sclerosis Manual Ability items comprises 20 items.
Conclusion
The synthesis of qualitative and quantitative methods used in this study improves the ABILHAND content validity to more effectively identify manual ability changes in early multiple sclerosis and potentially help determine treatment effect in higher-functioning patients in clinical trials.
Introduction
In addition to walking disability, cognitive problems, depression, and fatigue, manual disability is a prominent problem for many people with multiple sclerosis (MS)1–4 that affects the ability to perform essential activities of daily living efficiently and independently.1,5 Manual disability is common2–4,6 even in the early or mild stages of the disease, with up to 60% of patients reporting symptoms in the first year post-diagnosis.2 Therefore, change in manual ability is an important aspect to monitor in clinical practice for disease progression or therapeutic effect. Traditionally, in clinical trials, manual ability has been assessed using performance outcome measures, such as the Nine-Hole Peg Test (9HPT).7,8 These assessments, although practical for use in clinical settings, are not by themselves informative about the daily life impact of MS (and potential treatment benefit) on patients’ manual ability. Therefore, more robust patient-reported outcomes (PROs) of manual ability are needed for pivotal clinical trials and in the usual care setting to assess treatment benefit from the patients’ perspective.
ABILHAND is a PRO instrument originally developed to assess manual disability in stroke9 but has recently been used in clinical trials for MS.10–12 It is essential to evaluate the extent to which any PRO instrument provides valid measurement, and appropriately reflects the patient experience in any new context of use.13,14 This may be achieved through the discipline of psychometrics15 where three paradigms exist: traditional psychometrics based on classical test theory (CTT),16 and modern psychometrics including Rasch measurement theory (RMT)17,18 and item response theory (IRT). A previous CTT study of ABILHAND-23 in MS suggested adequate reliability and validity.19 However, subsequent RMT evaluations of ABILHAND-2319,20 and ABILHAND-5620 indicated limited measurement range and precision (i.e., increased error associated with measurement) in MS patients with Expanded Disability Status Scale (EDSS) levels between 0–2, which impact ABILHAND’s ability to detect differences in manual ability in higher-functioning MS patients. Additional item fit analyses further suggested that there is probably more than one clinical concept related to manual ability underlying the scale; these concepts are “fine motor” (dexterity) and “power.”20
Given these limitations, the goal of the study presented here was to troubleshoot the ABILHAND-56 to increase its applicability to the broadest possible population of patients with MS. As ABILHAND-56 is used on an ongoing basis in a specific drug development program, addressing ABILHAND’s measurement limitations in higher-functioning MS patients is important to improve measurement range, precision, and potential to detect treatment effect, and subsequently confirm the item clarity and relevance in MS. In this multi-phase, mixed methods study, we aimed to build on previous work by identifying additional candidate items to build on the two clinical concepts underpinning the ABILHAND-56, and thus to improve its ability to detect differences in manual ability in higher-functioning MS patients.
Materials and Methods
Study Design Overview
We used a step-wise mixed methods design strategy comprising two waves of patient interviews, a combination of qualitative and quantitative analytic techniques, and consultation interviews with three clinical neurologists specializing in MS (see Figure 1).

Study overview. EMS: Early Multiple Sclerosis; MS: multiple sclerosis; RRMS: relapsing–remitting multiple sclerosis.
Mixed methods design is broadly defined as the combination and comparison of multiple data sources, data collection, analytical procedures, or research methods.21 In psychometric research, mixed methods specifically refers to the synthesis of qualitative and quantitative methods to identify, define and operationalize PRO instruments as measures of a given concept of interest in a specific context of use.14
Study Population and Recruitment Process
Institutional review board approval was obtained, and written informed consent was provided by all study participants. Early relapsing–remitting MS (RRMS) patients were recruited through the study sponsor’s patient services department and through a social media site for MS patients. Patients were eligible to participate if they were diagnosed with RRMS within the last two years and had a Patient Determined Disease Steps (PDDS)22 score of 0–1 (no to mild disability). The PDDS range was selected to coincide with the EDSS 0–2 levels where previous research indicated limitations in the ABILHAND’s measurement range and precision.
Patient Interviews
In Wave 1, concept elicitation interviews were used to identify aspects of manual ability relevant to this patient sample. This was to guide identification of new items that could be used to supplement the ABILHAND. We then asked patients to complete the ABILHAND-56 to further assess its relevance in early RRMS.
In Wave 2, we conducted cognitive debriefing interviews to establish relevance, clarity, and ease of completion of the draft supplementary items that were generated in Wave 1. A “think aloud” process was followed where patients were asked to complete the items while thinking aloud and specifically noting any queries, problems, or ambiguities of the questionnaire.23 All interviews were conducted over the telephone; the ABILHAND-56 and supplementary items were displayed on patients’ computer screens and item responses captured via an online platform. Interviews were audio-taped and transcribed verbatim. In addition, consultation interviews with three neurologists specializing in MS (SCohan, MDG, KKR) were conducted at each of the two waves.
Materials
Based on the findings of our previous psychometric analysis,20 an expanded four-level response scale, very easy, easy, difficult, and impossible, was used to improve the ABILHAND-56’s potential to capture manual disability in this early RRMS sample.9
Data Analysis
Qualitative analysis – concept elicitation
Transcripts were analyzed thematically24 using detailed line-by-line coding25 to examine, compare, and develop treatment benefit conceptual domains using ATLAS.ti software.26 Coding was targeted to manual ability. Codes and quotations were inductively categorized into overarching domains that reflected their conceptual underpinning. Each code was compared with the rest of the data to create analytical domains and sub-domains. Saturation was assessed by ordering interviews chronologically, then grouping these into quantiles and comparing concepts emerging by each sequential quantile to assess whether saturation had been reached (i.e., no new concepts emerged).
Qualitative analysis – cognitive debriefing
This analysis aimed to identify any potential wording ambiguities and assess relevance and acceptability in relation to each question item, response scale and set of instructions as well as identify additional items that could expand the measurement of manual disability in early RRMS.23
Item generation
Item generation followed item construction principles,13,27–29 aiming to have an adequate range of items to cover the conceptual breadth within each of the upper limb mobility sub-domains. Concepts chosen for item development were activities that were applicable to the broadest range of people with MS. Lay language was used in item constructions, using as many of the patients’ own words as possible while aiming for brevity and minimal semantic overlap.
Quantitative data analysis
A small-scale RMT analysis was performed on data available for the ABILHAND-56 at Wave 1 and ABILHAND-56 as well as supplementary items at Wave 2 using RUMM2030 analytical software.30 RMT analysis compares observed data against the stringent criteria of the Rasch model, broadly aiming to assess the sample-to-scale targeting, the measurement continuum, and sample measurement.31,32 Considering the small sample size, which would not permit any confirmatory conclusions to be made about the items’ measurement properties, the focus of this quantitative analysis was to improve to scale targeting. Targeting refers to the match between the distribution of a construct (e.g., manual disability) in the sample and the range of the construct measured by a PRO instrument.33,34 The better this match is, the greater the potential for accurate evaluation of a PRO instrument and accurate person measurement. Results were interpreted with reference to published criteria wherever possible.32
Results
Study Sample
RRMS patients (n=88), with an RRMS diagnosis <27 months, participated in Wave 1 interviews, 69.3% (n=61) of whom reported difficulties with manual ability at screening (Table 1).
Sample characteristics.
PDSS: Patient Determined Disease Steps; RMT: Rasch measurement theory; SD: standard deviation.
Wave 1 Qualitative Results
Concept elicitation
Eighty-two unique codes related to manual disability were identified. Seventy-five of these emerged as “upper limb” concepts in initial coding; seven additional upper limb concepts were identified in retrospective review of activity limitation concepts. Inductive categorisation of these concepts into higher order sub-domains and domains replicated the two-level manual disability conceptual structure suggested in earlier work.20 Early RRMS patients indicated issues with upper limb mobility related to dexterity that were categorised under the “fine motor” sub-domain as well as issues related to strength categorized under the “power” sub-domain (Table 2). Consultation with the three neurologists specializing in MS was supportive of the two-domain structure.
Examples of patient descriptions under fine motor and power sub-domains.
Saturation analysis indicated that the 88 interviews produced a comprehensive set of concepts with relation to manual disability in higher-functioning people with RRMS; 66 of 75 of the initially identified upper limb mobility concepts arose within the first 30 interviews and the remaining nine concepts either echoed concepts derived from earlier interviews, were not generalizable to the entire MS population, or already existed in the ABILHAND-56.
Item generation
Of the identified concepts, 40 of 82 were not covered by existing ABILHAND items; of these, neurologist feedback suggested that 20 of these 40 were more clinically relevant to MS patients with less severe manual disability. This led to the drafting of 23 items: 11 “fine motor” and 12 “power” items. We identified these item sets as Early Multiple Sclerosis Manual Ability – Fine Motor and Early Multiple Sclerosis Manual Ability – Power.
Cognitive debriefing, item reduction and refinement
Findings from Wave 2 interviews suggested that 20 of the 23 supplemental items were well-understood and acceptable to patients. However, three items appeared to overlap in sub-domains. Patients interpreted “washing hair in the shower” and “holding a full bag of groceries” as relating to both lower limb and manual ability. “Holding the steering wheel while driving for a long time” was deemed unclear as patients associated this item with multiple actions (including turning the wheel and shifting gears). Subsequent consultation with neurologists led to removal of the three items not focused on manual ability and to wording revisions of the remaining supplementary items. For example, “inserting a cable into a USB port” was changed to the more widely-applicable task of “inserting a cell phone charging cable into a cell phone.”
Final supplementary items for ABILHAND in early MS
Findings from Wave 2 supported a final item pool comprising 10 “fine motor” and 10 “power” Early Multiple Sclerosis Manual Ability items (Table 3).
ABILHAND plus Early Multiple Sclerosis (MS) Manual Ability items, by theorized sub-scale.
Quantitative Results: RMT Psychometric Analysis
In line with previous findings,20 endorsement frequencies indicated that none of the patients endorsed the “impossible” response option for 49 of the 56 ABILHAND items in Wave 1 and 69 of the 79 ABILHAND plus supplemental items in Wave 2. As this lack of endorsement of one of the four categories could artificially inflate the extent of sub-optimal targeting for these analyses, the four-level response scale was rescored into three levels, merging the two higher categories (“very easy” – “easy” – “difficult/impossible”) for this analysis.
Table 4 details the sample-to-scale targeting for the different scale versions at Wave 1 and Wave 2. Findings are presented in an interval 0–100 transformed score, based on the interval logit metric produced by RMT analysis. In alignment with the sample’s PDSS scores, the sample mean was consistently below the scale mean (<50), indicating that these patients lie on the lower end of the manual disability continuum. The supplementary items both in their draft and final form shift the sample measurement means closer to the scale mean for all three different versions of the scale (36.65 to 38.87, 36.88 to 37.51 and 35.39 to 41.01 for the ABILHAND, fine motor, and power scales respectively).
Overview of Rasch measurement theory (RMT) sample-to-scale targeting results.
SD: standard deviation.
aWhere the scale item range is set to range from 0–100 and item mean always at 50; bpatients for whom the scale items are too easy; cfinal items as available at Wave 2.
The range of standard error (SE) associated with measurement is also reduced by the added supplementary items, indicating precision associated with measurement is increased. The highest SE associated with measurement is reduced from 5.46 to 2.89, 5.61 to 4.18 and 11.61 to 3.64 for the three respective scales (Table 4). Finally, the percentage of people at the ceiling (people for whom the scale items are too easy) is reduced by the supplementary items for the ABILHAND-56 and the Early MS Manual Ability sub-scales. Figures 2–4 display the relative improvements to sample-to-scale targeting graphically.

ABILHAND-56 sample to scale targeting.

Fine motor sample to scale targeting.

Power sample to scale targeting.
Discussion
In this multi-phase, mixed-methods psychometric study, we identified 20 additional candidate items to help improve the ABILHAND-56’s ability to detect differences in manual ability in higher-functioning early RRMS patients. The robust development process included patient and clinician feedback as well as modern psychometric analysis.
Wave 1 in-depth qualitative research findings indicated that the majority of existing ABILHAND-56 items were well-understood and appropriate to this MS sample, confirming the ABILHANDs relevance in this clinical population. In addition, we identified a rich pool of relevant manual ability concepts aligning with the previously-identified two-level fine motor and power manual ability conceptual framework.20 Clinical neurologists helped ensure that item development focused on the most clinically relevant additional supplementary items to expand the ABILHAND’s measurement range. Wave 2 patient interviews ensured relevance, understanding, and acceptability of the supplementary items, in addition to providing evidence for revision and refinement.
The macro-level psychometric analysis of the addition of the new items, based on RMT, suggests improved targeting in this higher-functioning RRMS sample, with lower ceiling effects and greater precision (the ability to discriminate different levels of manual ability). The analysis also provided evidence that an altered response scale to further improve targeting for higher-functioning patients is needed; this adaptation should therefore be considered for future MS studies using this scale.
A mixed method psychometric approach advances our understanding of content validity and helps ensure that a PRO instrument adequately reflects the patient experience in a given context.13,14 This process is vital to maximize clinical interpretability, particularly when scores derived from PROs are used to make decisions about the state of disease and treatment.35 Our study used a novel mixed methods approach that demonstrates how we can efficiently conduct psychometric research to empirically troubleshoot legacy PRO instruments to ensure they appropriately capture the targeted concept of interest in a specific context of use.14
Traditionally, PRO instruments are developed via a three-step approach moving through qualitative concept elicitation and cognitive debriefing, to quantitative field testing.36,37 However, we suggest this standard linear methodology limits our ability to efficiently construct items, elaborate upon response options, identify anomalies, and troubleshoot overall instrument design. Therefore, we advocate an integrated, iterative process, prior to PRO instrument field testing. Using this approach, we generated optimal supplementary items for the ABILHAND in MS, which could help improve the match between manual ability in this population and subsequently improve manual ability measurement and interpretation in MS studies. It is important that the supplemental items only be used in conjunction with the ABILHAND items, as they do not measure the full spectrum of MS manual ability on their own.
The outcome of this study has been the development of a potential new tool, which could be used in clinical practice and clinical trials to measure changes in manual ability in MS from the patients’ perspective. Attention to manual ability should be a central focus in clinical management and development of new therapeutic/clinical interventions, including emerging candidate reparative therapies.38 In the current MS research and treatment landscape, it is increasingly clear that measures need to be targeted to include the highly-functioning population, and need to be sensitive to changes relevant to their functional status, particularly in studies focusing on preserving physical ability of newly diagnosed MS patients or reversing the damage caused by the disease before irreversible axonal loss takes place.19,20 Findings from this multi-phase mixed methods study indicate that the Early MS Manual Ability items expand manual ability measurement to issues relevant to higher-functioning patients and therefore have the potential to increase sensitivity to detect subtle clinical change in higher-functioning MS patients. The recent treatment effects observed with natalizumab on the 9HPT components of the primary endpoint in patients with advanced non-relapsing secondary progressive multiple sclerosis (SPMS) in the ASCEND natalizumab trial highlight the importance of having robust clinical outcome assessments, including PROs, to measure treatment effects on upper extremity function.39
While our findings with Early MS Manual Ability are encouraging, they should be interpreted with consideration of the study’s limitations. The structure of the ABILHAND and Early MS Manual Ability item stem (“How difficult are the following activities”) is simple and function descriptions are brief; patients reported they were able to complete the items quickly, with few problems. However, given that the enhanced conceptual coverage in higher-functioning people with MS is achieved by adding 20 items to the existing ABILHAND-56, it will be worthwhile to explore the burden presented by adding additional items in future studies. Given that inclusion criteria were based on self-report information and because of the small sample size of the RMT analysis, additional analysis in a larger clinically defined sample would help confirm the validity and generalizability of these findings. The scoring structure of the ABILHAND-56 and Early MS Manual Ability items is empirically supported by a psychometric analysis in one context and strictly requires further psychometric testing. Finally, the revised scoring structure improves but does not resolve all the measurement issues related to the original ABILHAND-56.
Through mixed methods psychometric research, we generated 20 supplementary items to improve the targeting on ABILHAND-56 in higher-functioning MS patients. The qualitative and quantitative findings support its use in measuring manual ability in MS from the patients’ perspective. Further data from a larger clinically defined sample is needed to confirm the new items’ measurement properties.
Footnotes
Acknowledgments
The authors wish to acknowledge and thank the 118 patients who shared their MS stories with the research team, as well as JoAnne Liebeler, Catherine Podeszwa, and Sasha Spite, project interviewers.
Conflict of Interest
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Shih-Yin Chen, Jennifer Petrillo, and Carmen Castrillo-Viguera are employees of and stockholders in Biogen. Diego Cadavid was an employee of Biogen when the study was conducted. This work is not related to his current employment with Fulcrum Therapeutics. Farrah Pompilus, Sophie Cleanthous, Sara Strzok, Stefan Cano, and Patrick Marquis are employees of Modus Outcomes, which received payment from Biogen Pharmaceuticals to conduct this research. Stanley Cohan receives research support from Biogen, Novartis, Mallinckrodt, Sanofi-Genzyme, Genentech and Opexa, is a paid consultant and/or serves on advisory boards for Biogen, Sanofi-Genzyme, Novartis and has received speaking honoraria, travel expenses and meals/lodging from Biogen, Novartis, Sanofi-Genzyme, Acorda, and Genentech. Myla Goldman has received personal consultancy funds from EMD Serono, Genzyme, and Novartis, and institutional consultancy and/or research funds from Acorda, Biogen Idec, and Novartis Pharmaceuticals, and is grant supported by the National Multiple Sclerosis Society and National Institutes of Health (K23NS062898). Kiren Kresa-Reahl speaker’s bureau honoraria: Biogen, Novartis, TEVA, EMDSerono, Mallinckrodt, Genzyme. Consultant services: Biogen, Genentech, EMDSerono. Research Support: Biogen, Novartis, Mallinckrodt, Genzyme, Genentech.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was funded by Biogen.
