Abstract
Background:
Duchenne muscular dystrophy (DMD) is a progressive neuromuscular disease characterized by progressive muscle weakness, multiple system involvement and premature mortality. Effective treatments for DMD through clinical trials and natural history studies are currently underway. Clinical trials in DMD typically include several outcome measures of motor function. Research sites and studies have been found to have slightly different operational definitions for a given functional outcome resulting in different procedures and protocols for these measurements.
Objective:
The goal of this study is to establish agreement among experts in the field around best practices in collecting functional outcome data in DMD providing researchers and clinicians with guidance on best practices.
Methods:
A group of 30 experts in Duchenne Muscular Dystrophy (DMD) with experience in the development and/ or execution of lower extremity outcome measures for this population met face to face to identify incongruences in the collection of this data. This effort was based in the United States (US) and sponsored by Parent Project Muscular Dystrophy. Several discrepancies were categorized for each outcome which included: 6-minute walk test, 10-meter walk/run, supine to stand, ascend 4 stairs, sit to stand, and the NorthStar Ambulatory Assessment. Following this meeting an additional 32 experts in DMD (28 from the United States and 11 international participants) consented to participate in a Delphi Survey to reach consensus on the protocols and execution of lower extremity outcomes.
Results:
Round one: 70 operationally defined questions were surveyed with 45 (64%) reaching >70% consensus. Round two: 27 questions were operational, with 20 (74%) reaching >70% consensus. Those questions that did not reach consensus appear minor.
Conclusion:
With minor modifications in the collection of data across sites, outcomes could potentially be normalized across research studies. This would reduce excessive training for evaluators in trials and produce minimal differences between protocols. Consistency in protocols will promote more efficient study start up, less errors between administration of items across studies, and ultimately improve quality and reliability of the functional outcomes. The authors strongly advocate for the establishment of a “research network library” that could be utilized by all those performing clinical assessments and trials in DMD.
Keywords
INTRODUCTION
Duchenne muscular dystrophy (DMD), a progressive neuromuscular disease is the most common childhood muscular dystrophy, affecting 1 in 3,500 to 6,000 males born each year [1]. DMD is characterized by progressive muscle weakness and loss of motor function; leading to orthopedic, respiratory, and cardiac complications and premature mortality [2–4]. Considerable effort is being devoted to developing an effective treatment for DMD, with nearly 50 clinical trials and natural history research studies in progress [5]. Typically, clinical trials in DMD include tests of motor function as primary efficacy endpoints. These include: the 6-minute walk test (6MWT), 10-meter walk/run (10 m), supine to stand (STS), ascending 4 stairs (4Stairs), sit to stand (Sits), and the NorthStar Ambulatory Assessment (NSAA). Each of these measure has well-documented reliability and validity. Overall, similar procedures are used to assess them across sites [6–20]. However, research sites and studies frequently have slightly differing operational definitions for a given functional outcome with different procedures and protocols for measuring a given outcome.
This lack of consistency hinders the ability to compare data across trials and protocols. Comprehensive data sharing is needed to fully understand this rare and heterogeneous disease. In addition, data sharing may allow natural history data to minimize the burden of a placebo arm in some clinical trials. The Food and Drug Administration (FDA) has recently indicated that in certain circumstances, external control groups could be an acceptable alternative to placebo, but that “potential sources of bias, such as differences in encouragement during tests of physical performance or function, should be eliminated or minimized” [21]. Efforts are underway to amalgamate currently available natural history and placebo data into large data banks to improve understanding of the modern natural history of DMD and to provide comparative data for therapeutic clinical trial planning [22, 23]. The Duchenne Regulatory Science Consortium (D-RSC) is one of these endeavors working to provide resources for clinical trials. The D-RSC is collecting data from multiple sources in an effort to identify differences between databases [24]. Further they are developing a DMD progression model that may decrease the sample size needed for clinical trials and accelerate drug development [25]. The FDA has promulgated stringent requirements for functional outcomes used in studies involving boys with DMD. These requirements include: the clinical outcome assessment be “a well-defined and reliable assessment of a specified concept of interest for use in adequate and well-controlled studies in a specified context of use” [26].
A number of areas of ambiguity in the original descriptions of commonly used functional tests have given rise to differences in test implementation (and thus interpretation) across studies, evaluators, and sites. It is crucial that agreement is established across clinical trials, physical therapists, evaluators, and study protocols. This would allow data collected to be consistent across clinical sites, research studies, and research networks across investigative groups. To establish this agreement, there was a need to identify the discrepancies that exist between protocols and between evaluators at different clinical sites. Therefore, this study sought to use a modified Delphi method to determine expert opinions from a wide range of clinicians and clinical researchers around the world regarding best practice in the collection of functional outcome data in clinical trials for DMD. The Delphi technique is a unique way to obtain consensus among experts involved in the collection of such data [27–31]. This methodology has been adopted and commonly used to develop standardized guidelines, enhance decision making, and develop policies in medical, nursing and other health professions [27–31].
The goal of this study is to establish agreement among experts in the field around best practices in collecting functional outcome data in DMD and to provide researchers and clinicians with guidance on these best practices. This will facilitate the comparison and use of aggregated data across clinical and research sites, which has the potential to offer tremendous benefit to the community.
METHODS
Expert meeting
The first step was to identify and describe the current differences in the collection of comparable functional outcome data across DMD research sites. This was carried out through a 13-hour face to face meeting of thirty physical therapists and evaluators with expertise in assessing individuals with DMD that was held as a pre- conference event at the 2014 Parent Project Muscular Dystrophy (PPMD) Connect Conference, which took place in Chicago, IL, USA. The participating experts were all from the United States and had experience in conducting clinical evaluations as part of DMD research trials, conducting follow up clinical visits, developing outcomes/tests, and/or providing clinical care for individuals with DMD.
The foundation for the ensuing meeting was established through a presentation that addressed the latest developments in our understanding of DMD and included a summary of ongoing and upcoming DMD research clinical trials. In addition, presenters discussed the outcome measures typically used in DMD research and clinical care, issues related to the nuances of measuring clinical outcomes, and the importance of developing standardized procedures and protocols for carrying out data collection in neuromuscular clinics and trials.
Meeting participants were then divided into assigned work groups of 5–7 individuals. Groups were tasked with identifying common discrepancies in standardization of procedures, protocols, data collection, and execution for each identified DMD outcome measure. Information generated by the workgroups was summarized and prioritized with the entire group of experts. Importantly, the experts agreed that all outcomes were well established in this population with published, detailed instructions [6–20]. The identified discrepancies were minor and could be easily modified by an evaluator or study protocol (i.e. wearing shoes vs. no shoes, changing the stop and start positions for a test, etc.).
Delphi survey questions were subsequently drafted based upon discussion group consensus from the meeting, questions were then drafted and revised as a set of 117 items for inclusion in round 1 of the Delphi Survey of OutcomesDMD questionnaire. Because longitudinal studies and clinical trials have focused primarily on ambulatory boys with DMD, lower extremity outcome measures were selected as the focus of this survey. Six “typical” functional outcomes identified at the PPMD Annual Connect Conference were surveyed with two additional queries of strength and range of motion (ROM). Discrepancies identified during the PPMD pre-conference on outcome measures determined the OutcomesDMD Delphi Survey questionnaire, which consisted of multiple choice, yes/no, or open text fields. The goal of round one was to evaluate consensus among the participants and assess themes that resulted from the open ended questions.
The OutcomesDMD Delphi Survey questionnaire was then sent to a second set of experts in DMD. This group included therapists from the US and 7 other countries across 4 continents (detailed below).Therapists from major centers outside of the US known for their care and research efforts in DMD were targeted. Inclusion criteria included: 1) Minimum of two years-experience in the treatment and/or assessment of individuals with DMD or 2) involved in clinical care evaluations in neuromuscular clinical care or trials.
Thirty-nine of the 50 individuals contacted by e-mail met the inclusion criteria, signed an informed consent form approved by the Institutional Review Board at University of Florida to participate in the Delphi survey, and received a link to access the questionnaire online. These 39 individuals resided in the following countries: The United States (28 participants from 12 states) Canada (1), Australia (2), Belgium (2), Netherlands (1), United Kingdom (3), Argentina (1), and China (1). All participants received the questionnaire in English. Of the 39 that were consented, thirty- two began the first round of the survey and thirty-one began the second round of the survey. Although the inclusion criteria were broad, the participants were found to have at least 5 years of experience with the majority falling into a range between 15 and 20 years’ experience working with boys and young men with DMD either in test development, research trials, or clinical care.
There were two rounds of the survey, and both rounds were available online for 2 weeks, and participants received email reminders to complete the survey within the designated time frame.
Prior to survey distribution consensus was defined at 0.70. In keeping with the Delphi methodology, results from round one were analyzed, and a revised instrument that included 42 items was developed [27–31]. The second round of the questions was sent to all individuals who completed the first survey. The same procedures to establish agreement were followed for round two. One participant was lost to follow up in the second round (Fig. 1).

The flowchart demonstrates the steps from preconference through round 1 and round 2. It depicts the number of participants (active at each stage and non- responders/lost to follow-up) and number of responses to Delphi survey for rounds 1 and 2.
RESULTS
To analyze the data, questions were defined as “operational” if the content was related to the procedures and execution of the test. All open ended questions were considered opinion and therefore there was no intent to reach consensus on those items. The results were divided according to outcome measure (Figs. 2–5). Several questions included in the survey permitted the selection of multiple answers. The percent agreement among participants on all questions was recorded for both rounds. A complete list of all questions and responses for both rounds of surveys can be viewed in the Appendix A and B.

Delphi Survey results for the 6 Minute Walk Test for rounds 1 and 2. The vertical dotted line depicts 70% consensus from participants.

Delphi Survey results for the (a) 10 m walk/run and (b) Supine to Stand for rounds 1 and 2. The vertical dotted line depicts 70% consensus from participants.

Delphi Survey results for the (a) 4Stairs and (b) Sit to Stand for rounds 1 and 2. The vertical dotted line depicts 70% consensus from participants.

Delphi Survey results for the NSAA for rounds 1 and 2. The vertical dotted line depicts 70% consensus from participants.
In round one, 70 of 117 questions met the criteria as operational. Of these 70 questions, 45 (64%) reached >70% consensus. In round two, 27 out of 42 questions were operational. Of these 27 questions, 20 (74%) reached >70% consensus (Fig. 1). Across the 6 tests (6MWT, 10 m, STS, 4Stairs, Sits, NSAA) for lower extremity outcomes surveyed, several common areas of consensus or disagreement were identified. More than 90% of participants agreed the position of the subject at the start of the test should be consistent. Agreement was reached among participants that test instructions should be consistent with the instructions intended by the investigators who originally validated the tests. Articles published on the reliability and validity of these outcomes often do not include manuals with detailed instructions. This underlines the importance of consistent and standardized manuals of operation procedures across studies and study sites that embody the original guidelines for these outcomes without variation among evaluators. Over 83% of participants agreed that individuals being tested should receive continuous verbal cueing during testing to ensure best performance across all outcomes. While >86% recommended tests (10 m, STS, 4Stairs, and NSAA) be performed consistently barefoot, 97% experts identified athletic footwear be worn (i.e. tennis shoes or supportive shoes) for the 6MWT.
Specific procedures for each of the individual outcome measures had highlights that were related to procedures and execution of the test. For instance, experts did not reach consensus on whether subjects should view a video demonstrating the 6MWT procedures prior to the test, even for subjects performing the 6MWT for the first time (<60%). In addition, providing a designated rest period prior to the 6MWT did not reach consensus. However, participants agreed that if the study protocol listed it as a procedure it should be done consistently (>83%). Two potential sources of variation for the 6MWT included the start position and identifying when the timer should begin.
Participants reached consensus on both of these areas. The recommended start position is with arms at the sides (100%) and normal base of support without staggering of feet (>90%). The 6MWT begins when the evaluators says “GO” (>70%) and not at the first movement of the subject. There was agreement on most required equipment for the NSAA and 4Stairs with the exception of testing surface i.e. carpet vs. tile, use of a mat vs. floor. Experts did not reach consensus on some of the survey items. These items included: when to stop a test on the 10 m and the Sits, how many trials should be performed of each outcome measure i.e. 3 trials vs.1 trial, and how to determine if “best effort was used by the subject”. Refer to Figs. 2–5 for a summary of all outcomes.
See Appendix A and B for specifics of each question and the responses from round one and round two surveys.
DISCUSSION
This study used the modified Delphi Survey methodology to identify discrepancies across research sites and clinical trials that may impact the comparison of data from multiple studies [27–31]. Experts were able to reach consensus on sixty-seven percent of all operational questions between round one and two, in regard to nuances in protocols and procedures in testing. Experts agreed upon consistency in several of the key areas of concern, and those items that did not reach consensus appear minor. This indicates that, with small changes in the collection of data across sites, outcomes could potentially be normalized across studies. An improved continuity of outcomes between studies would benefit trial sponsors as well as participants. The removal of small differences between study protocols would permit standardized testing protocols, along with a uniform approach to training and credentialing of study personnel. Standardized definitions and protocols would enable study sites to refer to a single detailed manual or procedures, which would be more economical and streamlined. Importantly, uniformly accepted tests and methods would reduce tester variability, which in turn could more easily enable investigators to discern a treatment effect with a smaller sample. Pivotal pharmacological trials require a comparison to a placebo or untreated control group, and assignment to a control group is frustrating for patients and families who commit to a considerable time burden for repeated testing. In contrast, uniform testing across longitudinal studies and clinical trials worldwide could potentially lessen the required sample size of an untreated or delayed treatment arm and minimize repetition of functional testing. While there is no formal research network to share data across clinical trials at this time, the Duchenne Regulatory Science Consortium (D-SRC) is leading a collaborative effort to accumulate clinical data into a common database [24]. In turn, the D-SRC data will be used to develop a comprehensive disease progression model that can potentially be applied to novel, adaptive designs for future clinical trials [25].
There were limitations to the Delphi Survey for OutcomesDMD, which should be considered in the interpretation of the results. This survey was developed in the US. While the survey was internationally distributed it should be noted that there was a US bias in the results with the majority of participants from the United States. Due to the nature of surveying therapists from several different countries, there may have been cultural differences, which cannot be ruled out and may have influenced the study results. The first Delphi round had a large number of questions and many open ended questions were included. Not all participants answered every question and this dropped as low as 27/32 for some questions in round one (details in Appendix A and B). Furthermore, some questions in round one were inadvertently repeated in other sections without the correct relabeling. This may have been confusing to participants and contributed to some unanswered questions. In addition, some participants communicated that the first survey was too long however, this was rectified in round 2. In spite of the complications of this type of study, consensus was reached on 67% of all operationally questions across two rounds of surveys.
RECOMMENDATIONS
One of the overall recommendations resulting from this study is to refine outcome measure protocols utilizing the Delphi study results in combination with accepted, established procedures for outcomes (see Appendix C). This would relieve evaluators in trials from excessive training with little nuanced difference between protocols.
Consistency in protocols will promote more efficient study start up, less errors between administration of items across studies, and ultimately improving quality and reliability of the functional outcomes. There is an innate variability in effort dependent tests and significance to control or manage administration differences for improved quality of the clinical endpoints overall resulting in decreased variability. The intent of this study is not to recommend a particular outcome for a study but to strongly advocate for the establishment of a “research network library” that could be utilized by all those performing clinical assessments in DMD. This standard protocol library would ideally include manuals of operating procedures that are consistent with accepted guidelines for commonly used lower extremity outcome measures combined with the results of this study (see Appendix C). Videos demonstrating proper performance/instructions for clinical trial operation in DMD would further strengthen this resource. A standard protocol library would be open to the DMD community with access by trainers, evaluators, and clinical trial investigators. This would allow for the collection of outcome measure data in a standardized method across research centers, neuromuscular clinics and clinical trials. As sites are more international, there needs to be one specific protocol that is internationally accepted so that a shared resource of interpretations may be used versus each trial doing separate translations further confounding the problem of administration.
These additional tools would establish consistency among trainers, clinical research evaluators, and provide a worldwide reliable longitudinal data bank enriching our interpretation of these outcomes. Most methodology for the collection of LE outcome data is consistent however, the development of these resources could help to eliminate the minor idiosyncrasies from one study protocol to another. With standardization of training for the collection of data including procedures and protocols, the hope would be that data could be shared worldwide across studies improving our understanding of the progression of this devastating disease and enhancing the clinical trials focused on ameliorating and curing DMD.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
Footnotes
ACKNOWLEDGMENTS
The authors would like to thank Parent Project Muscular Dystrophy for their support of this project and perseverance to improve the quality of life for people and families living with Duchenne. Their efforts help to bring families, researchers and clinicians to the same arena.
