Abstract
Background
To understand changes in the underlying progression of early Parkinson's disease, it is important to derive estimates of the threshold for meaningful motor progression on the MDS-UPDRS Part III in OFF medication state.
Objective
To estimate this threshold using two approaches: anchor-based analyses, and clinical consensus via a modified Delphi panel.
Methods
For the anchor-based analyses, data from a Phase II clinical trial were used. Mean and median MDS-UPDRS Part III change scores were calculated for those participants rated as ‘Minimally worse’ on the Clinical Global Impression of Improvement (using the first visit rated as ‘Minimally worse’ or worse, and at Weeks 24 and 52). Cumulative data up to Week 104 were used to assess the difference between motor progressors’ and non-progressors’ change scores on motor-related outcomes. For the modified Delphi panel, a panel of 13 expert clinicians received an online survey in two rounds and provided responses anonymously.
Results
For the anchor-based analyses, estimates of meaningful change ranged from 4−6 points. Numerically worse change scores were identified on motor-related outcomes for participants who had experienced motor progression compared with those who had not. For the modified Delphi panel, consensus was reached in Round 2, with 92% agreeing that 5 points is suitable to define a clinically meaningful motor progression threshold.
Conclusions
Results of the anchor-based analyses and modified Delphi panel were consistent, supporting a meaningful motor progression threshold of a worsening of 5 points on the MDS-UPDRS Part III (OFF medication state) in an early Parkinson's disease population.
Plain language summary
When the symptoms experienced by people with Parkison's disease are observed by others, such as doctors, they are called signs. The MDS-UPDRS Part III is a tool used by doctors to measure how severe motor signs are in people with Parkinson's disease. These signs include slowness, tremor, and rigidity. Doctors can use this tool whilst someone with Parkinson's disease is taking medication to manage their motor symptoms (known as ON medication state), or when the effect of the medication has worn off (known as OFF medication state). People with Parkinson's disease not receiving medication are in OFF medication state. It is important to assess symptoms/signs when the medication is not working to understand how the disease is progressing, and how new potential treatments may slow it down. In this study, the researchers wanted to understand what level of change in the MDS-UPDRS Part III score assessed in OFF medication state shows that the motor signs of someone with early disease have worsened. They looked at data from a clinical trial called PASADENA and found a change of 5 points on the scale to be meaningful for showing that motor signs were getting noticeably worse. They also asked a group of 13 expert doctors what they thought would be a meaningful change of points on this scale. The experts reached an agreement that a change of 5 points on the scale would be meaningful. This information can help doctors and researchers better understand and track Parkinson's disease in its early stages.
Keywords
Introduction
The Movement Disorders Society-sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) Part III, a clinician-rated examination, is commonly used in clinical practice and clinical trials to assess the severity of core motor signs of Parkinson's disease.1–5 Despite the use of this measure to assess treatment efficacy in clinical trials, there is limited evidence or consensus supporting a specific threshold of change that would be meaningful at the individual patient level. Given the progressive nature of the disease, there is particular interest in the threshold for meaningful within-patient worsening, which can be used to evaluate the efficacy of treatments aiming to slow disease progression. Specifically, such a threshold can be used to define a motor-progression event, and the proportion of patients meeting this event at a given time point, or a time-to-event approach, can be used to evaluate treatment efficacy.
For clinical trials in Parkinson's disease, the time-to-event methodology has particular merit given the impact of symptomatic therapy regimen changes on MDS-UPDRS Part III scores. For example, comparing patients who had received dopaminergic therapy prior to their 12-month visit (n = 192) with those who remained off dopaminergic therapies by their 36-month visit (n = 17), Holden et al. identified a marked difference in the annual progression of mean MDS-UPDRS Part III scores (1.77 vs. 4.02 points per year, respectively). 6 Furthermore, if the investigational drug is effective in delaying disease progression, it is reasonable to hypothesize that changes in symptomatic therapy regimen would occur more frequently in the control arm, potentially masking the treatment effect due to the improvement in Part III score resulting from the symptomatic regimen change. For a time-to-event endpoint, however, if the progression event typically occurs prior to such changes in regimen, the impact is mitigated. 7
Previously, Horvath et al. (2015) 8 produced estimates of meaningful within-patient improvement and worsening across a range of baseline severities determined by Hoehn and Yahr stage. They used a Clinical Global Impression of Improvement (CGI-I) as the anchor measure and employed the ‘mean change method’ by calculating the mean MDS-UPDRS Part III change score for given levels of the anchor (e.g., ‘Minimally worse’). Additionally, they used the ‘Receiver Operating Characteristic curve method’ to identify the optimal MDS-UPDRS Part III cut-off score that would distinguish between participants rated as ‘No change’ versus either ‘Minimally improved’ or ‘Minimally worse’. For an early Parkinson's disease population (Hoehn and Yahr stage 1‒2), the mean change method produced an estimate of 4.63 for participants rated as ‘Minimally worse’, and the Receiver Operating Characteristic curve method suggested a change score of 4.5 for distinguishing ‘Minimally worse’ from ‘No change’. Given that only integer change scores are possible at the individual patient level, these estimates support the use of a threshold of 5 points for meaningful within-patient worsening (i.e., to define motor progression). These analyses were conducted using MDS-UPDRS Part III data collected in ON medication state. However, for the evaluation of a potential disease-modifying therapy, the use of data collected in OFF medication state may provide a greater understanding of the level of progression of the underlying disease. Thus, it is important to evaluate the threshold for motor progression using data from individuals either not receiving levodopa, or in an OFF medication state.
To complement anchor-based methods, modified Delphi panels can be used to seek consensus amongst clinical experts who have experience in administering the MDS-UPDRS Part III. The modified Delphi method provides a means of collecting informed judgments from experts through multiple rounds of anonymous feedback, or iterations.9,10 At least two rounds of data collection are needed, allowing participants to view the responses of others. 11 The Delphi method is considered to have four defining features: anonymity for those participating; iteration of concepts; statistical group response based on frequency of selections; and informed input from expert participants. 12 The advantages of the Delphi process are that it enables expert opinion to be collected in a non-confrontational manner, free from pressures of a group, while group opinions are fed back anonymously to the panelists. 13
The objectives of this study were to: perform an anchor-based estimation of the motor progression threshold for the MDS-UPDRS Part III (OFF medication state) using data from PASADENA, a Phase II, randomized, double-blind, placebo-controlled study in participants with early Parkinson's disease (NCT03100149) 14 ; and seek clinical consensus, via a modified Delphi panel, on the meaningful motor progression threshold for the MDS-UPDRS Part III (OFF medication state). Additionally, to provide further context on the meaningfulness of motor progression, a third objective was to explore differences in daily function between motor progressors and non-progressors, using data from PASADENA.
Methods
Study design and participant selection
PASADENA analyses
Data were taken from the PASADENA study (NCT03100149) in individuals with early Parkinson's disease (defined as Hoehn and Yahr stage 1 or 2) who were either treatment naïve or receiving monoamine oxidase-B inhibitors. The study design and the study participants have been described in detail elsewhere. 14 Analyses were conducted in all participants randomized in the study who received any amount of study treatment across all treatment arms, defined as the modified intent-to-treat population. For participants who initiated levodopa during the study, the ‘OFF medication state’ MDS-UPDRS Part III rating was assessed approximately 12 h after the last dose.
Modified Delphi panel
This study was a consensus-building, two-round, online survey, conducted using a modified Delphi technique. The study design followed a stepwise approach (Figure 1). There are no definitive guidelines on optimal panel size, with numbers ranging widely in the literature. 12 However, a panel consisting of 10–15 participants with similar backgrounds or expertise has been recommended as sufficient15,16; small, homogenous groups are more likely to retain group members. 17 Panelists were recruited from a range of geographical locations including the US, Canada, the UK, France, Germany, Italy, and Spain. Panelists were required to be clinical experts with a minimum of 5 years’ experience administering the MDS-UPDRS scale, as well as being qualified to administer the MDS-UPDRS in randomized clinical trials and having administered the MDS-UPDRS at least 50 times in the previous year or 300 times in their lifetime. Additionally, panelists were required to have an H-Index of at least 20.

Modified Delphi panel study design.
Clinical experts were recruited by Global Perspectives, a company that specializes in supporting clinical studies through patient and clinician engagement. The Delphi technique relies on engaging people who are knowledgeable about a specific topic and, therefore, purposive sampling was used. All Delphi participants were invited to complete each Delphi round, unless they indicated withdrawal from the study. Continuation of the Delphi participant (by linking to the survey) was assumed as consent to participate. Each participant was allocated an identification number as part of the anonymization procedure. Each participant received reimbursement for participating. A double-blind procedure was maintained where the participants were not informed who the study sponsor was, and the investigators did not know the identity of the participants.
Measures used in PASADENA analyses
MDS-UPDRS Parts II and III
The MDS-UPDRS is a multimodal scale assessing both impairment and disability and is separated into four subscales (Parts I‒IV). For this study, the data from Part II (a 13-item patient-reported assessment of motor-related experiences of daily living over the past week) and Part III (a 33-item clinician-reported motor examination) were used. For each question, a numeric score is assigned between 0‒4, where 0 = Normal, 1 = Slight, 2 = Mild, 3 = Moderate, and 4 = Severe. Scores were then summed to determine the total score for each of Parts II and III. 5
Schwab and England Activities of Daily Living scale (SE-ADL)
The SE-ADL is a single-item scale assessing activities of daily living on a scale ranging from 0% (‘Bedridden’) to 100% (‘Completely independent’), using 10% intervals. 18
Parkinson's Disease Questionnaire-39 items (PDQ-39)
The PDQ-39 is a 39-item, patient-reported questionnaire with eight domains: Mobility (10 items); Activities of daily living (6 items); Emotional well-being (6 items); Stigma (4 items); Social support (3 items); Cognition (4 items); Communication (3 items); and Bodily discomfort (3 items). 19 Questions ask about the frequency with which participants have experienced difficulties due to having Parkinson's disease “during the last month”. All items are scored from 0 (‘Never’) to 4 (‘Always’), with domain scores calculated as the sum of the constituent items transformed onto a 0‒100 scale. Only the activities of daily living and mobility domain scores were used in this study.
Anchor measures
Two measures included in the PASADENA study were considered for use as anchors, the Patient Global Impression of Change (PGI-C) and the CGI-I. Both measures are single items assessing how much the participant's global health has changed since the start of the study. The PGI-C is completed by the participant, and the CGI-I is completed by clinicians. Both instruments have seven categorical verbal response options: ‘Very much improved’, ‘Much improved’, ‘Minimally improved’, ‘No change’, ‘Somewhat worse’, ‘Much worse’, and ‘Very much worse’. MDS-UPDRS Part II was not deemed to be suitable as an anchor, as it does not have a clearly interpretable definition of meaningful change. 20 Whilst thresholds have been estimated for MDS-UPDRS Part II, the uncertainty with those estimates would be carried over to the estimation of the meaningful progression threshold on MDS-UPDRS Part III. For this reason, simple ordinal scales, such as global impression measures, are typically preferred as anchor measures, as the meaningfulness is more directly interpretable from the response options (i.e., it is easier to interpret the meaningfulness of a rating of ‘Minimally worse’ than it is of a 3-point change on any combination of items).
Modified Delphi panel round 1
In the first round, the panelists were sent a link via email to a web-based version of the Delphi questionnaire in REDCap™, an online survey tool. In addition to the survey link, participants were provided with a copy of the MDS-UPDRS Part III for information purposes. The Round 1 questionnaire included background information, instructions for the survey, and a series of open-ended questions.
In this first round, the panelists were asked to indicate what change in score on the MDS-UPDRS Part III represented a clinically meaningful worsening (Question 1), and what change would reflect a clinically meaningful improvement (Question 2). Although the focus of the study was on worsening, the question for improvement was included to ensure that for the worsening question, only worsening was considered and not general change (i.e., the panelists were encouraged to consider both worsening and improvement separately). For Questions 3–5, panelists were asked to indicate if the change score for worsening would be the same for both Hoehn and Yahr stages 1 and 2, treatment-naïve patients and patients on stable symptomatic treatment, and ON or OFF medication state, respectively. These questions were included to help identify the applicability of a threshold across a broad population, and to set up the distinction of ON vs. OFF medication state for round 2. Finally, Question 6 asked panelists how important changes in MDS-UPDRS Part III scores would be when deciding to make changes to an individual's symptomatic treatment regimen. This question was included to provide more context on the meaningfulness of such changes. Panelists were asked to provide a brief rationale for each answer.
Modified Delphi panel round 2
In Round 2, panel members were sent a link to a second web-based survey in REDCap™. The content of the questionnaire was determined by the authors based on the results of Round 1.
Round 2 of the survey was split into two sections (both presented within the same survey). Section A included a summary of Round 1 results followed by five questions. For the first four questions, panelists were asked to indicate their level of agreement (from ‘Strongly disagree’ to ‘Strongly agree’) with specified score ranges for clinically meaningful worsening or improvement, dependent on both ON and OFF medication states. Again, improvement was included in addition to worsening, and ON medication state in addition to OFF medication state, to help the panelists consider the distinction when providing their perspective on a meaningful change threshold specifically for worsening in OFF medication state. Where panelists agreed with the statements, they were asked to indicate which value from the range was their single preferred value. A final question in Section A asked panelists to describe what the term “minimally clinically meaningful worsening” means to them and why.
Section B included a summary of the Horváth et al. study 8 and the anchor-based analyses described in this paper. This summary was followed by four questions asking panelists to indicate their level of agreement (from ‘Strongly disagree’ to ‘Strongly agree’) with: the use of ‘Minimally worse’ to define meaningful worsening, the findings from the anchor-based analyses, and both the range of 4–6 points, and specifically 5 points, as suitable thresholds for determining meaningful worsening on the MDS-UPDRS Part III (OFF medication state) score. Rationales for responses were also collected.
Analyses
All PASADENA analyses were conducted using SAS software, version 9.4.
Anchor- and distribution-based analyses
For anchor-based estimation, the anchor should be simple to interpret and sufficiently correlated with the target measure. 21 A threshold of 0.3, defined a priori, was used to define the lower limit of acceptability for correlations. 22 Spearman rank-order correlations between each of the anchor instruments and change in the MDS-UPDRS Part III score were assessed at Weeks 24 and 52. These visits were selected a priori to provide an earlier time point where a change in anchor category could reasonably be detected (Week 24), and a later time point where a greater number of individuals have progressed on the anchor (Week 52). Correlations were also calculated for Weeks 80 and 104 to provide further context on the relationship between the measures. This is important for considering the subsequent step of estimating the threshold, where a trade-off between sample size (i.e., requiring sufficient participants to have had a change on the anchor variable) and not overestimating the associated MDS-UPDRS Part III change. Additionally, empirical cumulative distribution function curves (i.e., curves for each anchor category, with cumulative proportion on the y-axis, and change from baseline MDS-UPDRS Part III scores on the x-axis) were produced. For use as a suitable anchor for determining motor progression, the response categories should have distinct curves (i.e., the response categories should be able to differentiate MDS-UPDRS Part III change scores; assessed qualitatively) for the ‘No change’ and ‘Worsening’ categories. Two final checks were conducted to assess the suitability of the anchor: the MDS-UPDRS Part III mean change scores should differ between the ‘No change’ and ‘Minimally worse’ categories, and MDS-UPDRS Part III scores at follow-up should differ from scores at baseline in the ‘Minimally worse’ category. For the former, 95% confidence intervals should be non-overlapping between the two groups (providing evidence that scores differ between the anchor categories of interest), and for the latter, a paired t-test should yield a significant result to indicate the score has likely changed.
To estimate the motor progression threshold, two methods were considered: mean change and the Receiver Operating Characteristic curves. Each method has strengths and weaknesses. For mean change, if the anchor meaningful change subgroup is normally distributed, then half of the individuals will fall below the threshold. This is problematic when the correlation of change between anchor and target measures is high, as we would expect the majority of individuals experiencing a meaningful change on the anchor to also have experienced a meaningful change on the target measure. Thus, at higher correlations, the mean change method likely overestimates the threshold. Conversely, when the correlation is weaker, using the mean may help to correct for the uncertainty regarding whether those who experienced a meaningful change on the anchor have also experienced a meaningful change on the target measure. For the Receiver Operating Characteristic curves, the situation is reversed, where a cut-off could underestimate the meaningful change threshold when the correlation is weaker. Indeed, one simulation study identified that correlations ≥0.5 were advised for the Receiver Operating Characteristic-based approaches. 23 Therefore, for weaker correlations (<0.5) the mean change method was preferred, and for stronger correlations (≥0.5) the Receiver Operating Characteristic was preferred. Given the correlations calculated were less than 0.5, the mean change method was used. Specifically, the mean and median MDS-UPDRS Part III scores were calculated using data from the first visit at which individuals were rated as ‘Minimally worse’. 24 Given that participants report experiencing progression of motor features in early disease, and that relatively few participants were rated as ‘Much worse’ or ‘Very much worse’ across the duration of the study, the ‘Minimally worse’ category was considered appropriate for determining the motor progression threshold. As the intent was to define an estimate that could be used in a time-to-motor worsening endpoint, the first visit at which the anchor category for progression (‘Minimally worse’) was met was considered the most conceptually appropriate time point. Additional estimates were produced for Weeks 24 and 52.
A commonly used distribution-based estimate, standard error of measurement, was also calculated, against which the anchor-based estimates could be assessed (i.e., to determine whether they are larger than a metric of measurement error). The following formula was used:25,26
Patient profile associated with motor progression
To provide further insights regarding the meaningfulness of motor progression, additional analyses in the same dataset were conducted to compare change scores on measures of meaningful daily function between motor progressors and non-progressors. If the motor progression threshold is meaningful, it was anticipated that greater progression in daily function would be observed in motor progressors versus non-progressors. An analysis of covariance, controlling for age, gender, and baseline outcome score, was conducted for each of the range of motor progression thresholds suggested by the anchor-based analyses. Outcome measures evaluated included MDS-UPDRS Part II, PDQ-39 activities of daily living and mobility domains, and the SE-ADL. These were selected as they measure motor-related concepts. Cumulative data from Weeks 52, 80, and 104 were used for MDS-UPDRS Part II and SE-ADL, and from Weeks 48, 56, and 104 were used for PDQ-39 domains (PDQ-39 was collected at different visits to MDS-UPDRS Part II and SE-ADL). As cumulative data were used, multiple data points from each participant, depending on availability of data, were included in each analysis.
Modified Delphi panel
In accordance with findings from a systematic review, consensus was defined a priori by percent agreement. 27 Alternative methods for confirming consensus were considered, such as consistency validity ratio; however, such approaches are typically used for retaining elements across rounds and are more difficult to interpret. Given the targeted nature of this study, percentage agreement was considered the most appropriate and understandable approach for determining consensus. The percent agreement threshold was set at 70% to reflect a reasonable majority.
The qualitative data collected were coded and analyzed in a structured way to reflect the objectives of this research. Coding of open-ended responses was structured to reflect information and corresponding clinical expert feedback. Categories of data were created for ease of management and for panelists to respond to in Round 2.
The quantitative data collected in Round 2 consisted of the level of agreement with items that emerged from the first round. These data were analyzed based on the definition of consensus set at the beginning of the Delphi process. Descriptive statistics were used to analyze response rates. Median and interquartile ranges were also used to characterize the frequency distributions of participants’ responses.
Results
PASADENA analyses
Participants
In PASADENA, a total of 443 participants were screened, and 316 were enrolled across the three treatment arms (placebo, n = 105; prasinezumab 1500 mg, n = 105; prasinezumab 4500 mg, n = 106). The number of participants who started or changed the dose or regimen of treatment for symptoms of Parkinson's disease by Week 52 was 29 (27.6%) in the placebo group, 27 (25.7%) in the prasinezumab 1500 mg group, and 31 (29.2%) in the prasinezumab 4500 mg group. For the current analyses, the treatment arms were pooled. The demographic and clinical characteristics of the participants at baseline have been described previously. 14
Identification and evaluation of anchors
Both the CGI-I and PGI-C were evaluated for suitability as anchors for determining a motor progression threshold on MDS-UPDRS Part III. The PGI-C was not sufficiently correlated with the target instrument (<0.3), and the empirical cumulative distribution function curves demonstrated substantial overlap, meaning it was not considered suitable for use as an anchor. The CGI-I correlations with MDS-UPDRS Part III were ≥0.3 at all visits (Weeks 24, 52, 80, and 104), above the recommended minimum correlation threshold (Table 1). 22
Spearman rank-order correlation between MDS-UPDRS Part III change from baseline and CGI-I.
CGI-I: Clinical Global Impression of Improvement; MDS-UPDRS: Movement Disorders Society-sponsored revision of the Unified Parkinson's Disease Rating Scale.
Corresponding empirical cumulative distribution function curves for Week 52 showed adequate separation between the ‘No change’ and ‘Minimally worse’ categories, and between the ‘Minimally worse’ and ‘Much worse’ categories (Figure 2). Whilst there is overlap between some response levels, this is not particularly surprising, given that relatively few participants are expected to improve their overall condition (as Parkinson's disease is a progressive disorder), and because of the low sample size in participants experiencing change beyond minimal in either direction at this early stage of the disease. Additionally, the 95% confidence intervals at both Weeks 24 and 52 do not overlap between the ‘Minimally worse’ and ‘No change’ categories, providing further evidence that the anchor categories are distinct regarding MDS-UPDRS Part III change scores (Table 2). Furthermore, a paired t-test demonstrated that follow-up scores were significantly different from baseline for the ‘Minimally worse’ category at both time points (indicating that the scores at follow-up had, on average, changed from baseline). Based on these analyses, the CGI-I was determined to be suitable for use as an anchor for identifying a motor progression threshold on MDS-UPDRS Part III.

Empirical cumulative distribution function plot of the change from baseline in the MDS-UPDRS Part III score by CGI-I categories at Week 52. CGI-I: Clinical Global Impression of Improvement; MDS-UPDRS: Movement Disorders Society-sponsored revision of the Unified Parkinson's Disease Rating Scale.
Anchor-based motor progression estimates for MDS-UPDRS Part III.
CI: confidence interval; MDS-UPDRS: Movement Disorders Society-sponsored revision of the Unified Parkinson's Disease Rating Scale; NA: not applicable; SD: standard deviation.
Motor progression threshold estimation
Overall, 251/316 (79.4%) patients had a CGI-I rating of ‘Minimally worse’ or worse at a post-baseline visit. For the motor progression threshold estimation, the mean and median MDS-UPDRS Part III (OFF medication state) scores were calculated (Table 2). The mean score for the aggregated time points was 4.98 points, with a median of 5 points. To support this, mean and median change values were also calculated separately for each visit, using only those participants for whom it was the first visit at which they were rated as ‘Minimally worse’. Only data from Weeks 24 (n = 93) and 52 (n = 73) were used, as these visits had the largest sample size (the next highest was Week 56, n = 23). The mean and median values support a range from a minimum of 4 points (based on the median value of the Week 24 data) to a maximum of 6 points (based on the median value of the Week 52 data). This is further supported by the distribution-based standard error of measurement estimate of 4.04. Based on the mean and median values of the aggregated data, a motor progression threshold value of 5 points is considered reasonable.
Patient profile associated with motor progression
Results of the analysis of covariance analysis showed that differences between motor progressors and non-progressors were below the nominal p-value of 0.05 for MDS-UPDRS Part II, SE-ADL, and PDQ-39 activities of daily living change scores (for all three definitions of progression points [≥4, ≥5, and ≥6 points]), with motor progressors experiencing poorer outcomes (Table 3). For other related outcomes, such as PDQ-39 Mobility, motor progressors experienced greater worsening, but these values did not show a significant difference (Table 3).
Analysis of covariance results for MDS-UPDRS Part II and PDQ-39 using values in the ‘OFF state’ for the MDS-UPDRS part III score.
Change from baseline in patients who had a change from baseline value of 5 in the MDS-UPDRS Part III Total Score at Week 52, Week 80, and Week 104.
Change from baseline in patients who had a change from baseline value of 5 in the MDS-UPDRS Part III Total Score at Week 48, Week 56, and Week 104.
MDS-UPDRS: Movement Disorders Society-sponsored revision of the Unified Parkinson's Disease Rating Scale; PDQ-39: Parkinson's Disease Questionnaire-39 items; SE: standard error; SE-ADL: Schwab and England Activities of Daily Living scale.
Modified Delphi panel
Panelists
A total of 13 panelists were recruited, with all 13 panelists completing both Round 1 and Round 2. Panelists had an H-Index range of 20–119 (mean = 40) and a range of 8–40 years of experience in Parkinson's disease (mean = 19 years). Panelists were mostly male (n = 9; female, n = 4), and came from seven countries including: Canada (n = 2), France (n = 2), Germany (n = 3), Italy (n = 2), Spain (n = 1), the UK (n = 1), and the US (n = 2). The panelists listed their job titles as: neurologists or physicians of neurology departments (n = 10), professors of neurology, neurodegenerative muscular diseases, or movement disorders (n = 5), and clinical investigators or researchers (n = 2), with three panelists listing multiple job titles.
Modified Delphi panel round 1
For Question 1 (clinically meaningful worsening), the score change ranged from 2–10 points, with a median of 5 points. The most common responses were a worsening of 4 or 5 points (both 23%). For Question 2 (clinically meaningful improvement), the score change ranged from 2–13 points, with a median of 4 points. The most common responses were an improvement of 4 points (38%) or 5 points (31%).
For Hoehn and Yahr stage (1 vs. 2) and treatment status (naïve vs. stable symptomatic), the majority of panelists indicated the score for worsening would be the same (62% and 69%, respectively). When asked if ON or OFF medication state impacts the clinically meaningful worsening threshold, 85% responded ‘yes’. Additionally, in the free text response following this question, two panelists spontaneously indicated that they had considered ON medication state in their responses to Questions 1 and 2, and one had considered OFF medication state (panelists were not directly asked which state was considered, so it is unknown which state the remaining panelists were considering when completing Questions 1 and 2). Lastly, panelists were asked how important changes to Part III scores are when deciding to make changes to a patient's symptomatic treatment regimen, with 15% stating ‘not at all important’, 69% stating ‘useful’, and 15% stating ‘essential’.
Modified Delphi panel round 2
The primary results from Round 2 are displayed in Figure 3, with secondary results displayed in Table 4. Consensus was reached after two rounds, with 92% of panelists agreeing that a 5-point increase in MDS-UPDRS Part III (OFF medication state) score is a suitable threshold for determining clinically meaningful worsening of motor signs.

Primary results from Round 2 of the modified Delphi panel. MDS-UPDRS: Movement Disorders Society-sponsored revision of the Unified Parkinson's Disease Rating Scale.
Secondary results from Round 2 of the modified Delphi panel.
CGI-I: Clinical Global Impression of Improvement; MDS-UPDRS: Movement Disorders Society-sponsored revision of the Unified Parkinson's Disease Rating Scale.
Section A
For ON medication state, 100% of panelists agreed that the change score for clinically meaningful worsening lies between 4–6 points, with the single preferred value being 4 (62%). All panelists (100%) agreed that the change score for clinically meaningful improvement lies between 3–5 points, with the single preferred value being 4 (62%). For OFF medication state, 92% of panelists agreed that the change score for clinically meaningful worsening lies between 4–6 points, with the single preferred value being 5 or 6 (both 46%). All panelists (100%) agreed that the change score for clinically meaningful improvement lies between 3–5 points, with the single preferred value being 4 (62%). For these four questions, no panelists strongly disagreed.
Panelists were also asked to provide a rationale for their selections. For the clinically meaningful worsening range for ON medication state, panelists who agreed or strongly agreed provided statements such as “would be functional/motor decrease in several different areas”, and “it is in line with my clinical experience and scientific data”. When asked to consider the worsening range for OFF medication state, panelists who agreed or strongly agreed made statements such as “generally yes, although worsening can be one point less in OFF state, as there are less points ‘to work with’” and “will represent a worsening in 4–6 items or marked worsening in 2–4 items”. One panelist disagreed, suggesting “it takes less delta in an off state due to decline in baseline function”. For the clinically meaningful improvement range in ON medication state, panelists who agreed or strongly agreed made statements including “a few points of improvement are important for patients” and “will represent an improvement in 3–5 items or a marked improvement in at least 3 items”. When considering the improvement range for OFF medication state, panelists who agreed or strongly agreed made statements such as “improvements in an off state are even more significant and remarkable” and “I don't think we should have different cut-offs ON or OFF”.
Section B
Most of the panelists (85%) agreed with the use of a category of ‘Minimally worse’ on a CGI-I to define minimally clinically meaningful worsening, with 92% of panelists agreeing with the conclusions of the previous research. As in Section A, 92% of panelists agreed that a change in MDS-UPDRS Part III (OFF medication state) score representing clinically meaningful worsening lies between a range of an increase in 4–6 points, with one additional panelist indicating ‘strongly agree’. The same 92% of panelists agreed that a change in MDS-UPDRS Part III (OFF medication state) score of 5 points represents clinically meaningful worsening. For these four questions, no panelists strongly disagreed.
In Section B, panelists were also asked to provide a rationale for their answers. For those panelists who agreed or strongly agreed to question 1, referring to the CGI-I minimally worse category to determine minimally clinically meaningful worsening, rationales, such as “this is a well-recognized rating scale” and “CGI-I rating here corresponds well with real life clinical evidence”, were provided. When asked to what extent they agreed with the conclusions of the previous research, panelists who agreed or strongly agreed provided rationales such as, “the methodology used is well established and the results are sound” and “I do agree with their conclusions. Minimal worsening is sufficiently sensitive and specific at the value of 5”. One panelist disagreed, giving the rationale “clinically worsening is very important for the patient, so 2–3 points would be an important level”.
Panelists who agreed or strongly agreed that a minimally clinically meaningful range for worsening on MDS-UPDRS Part III in OFF medication state would be in the range of 4–6 points provided statements such as “this will mean a worsening in several items, and therefore an overall worsening for early Parkinson's disease”, and “this is clinically meaningful worsening”. One panelist disagreed saying, “in the patient's view a worsening of 2–3 points is relevant”. For panelists who agreed or strongly agreed that a minimally clinically meaningful range for worsening on MDS-UPDRS Part III in OFF medication state would be 5 points, supporting statements were provided such as “5 points increases the sensitivity for the endpoint so there are no false positive ‘worsening’ assessments” and “5 points means a clear relevant change”. One panelist disagreed, again stating “in the patient's view a worsening of 2–3 points is relevant”.
Discussion
The anchor-based mean change method was utilized to identify a motor progression threshold range of 4‒6 points on the MDS-UPDRS Part III (OFF medication state). Comparison of change scores on motor-related measures, such as MDS-UPDRS Part II, between motor progressors and non-progressors (separately for a score of ≥4, ≥5, and ≥6) provided further supportive evidence of the meaningfulness of this progression threshold. As a single threshold is required for determining a motor progression event for a time-to-event or progressor analysis endpoint, the data support the use of a 5-point increase. Interestingly, this value is consistent with the ON medication state threshold suggested by Horvath et al. (2015). 8 Whilst the rate of progression may differ between ON and OFF medication states, these results suggest that the relevance of the burden resulting from worsening motor signs is consistent.
After two rounds, the modified Delphi panel reached consensus (92%) that a threshold of 5 points on MDS-UPDRS Part III (OFF medication state) is acceptable to define a meaningful motor progression threshold. There was also consensus (85%) on the use of ‘Minimally worse’ as a suitable threshold for determining minimally meaningful worsening, supporting its use in the anchor-based analyses. The consistency of these findings provides greater confidence in the use of this threshold.
In clinical trials, MDS-UPDRS Part III has historically been used within endpoints assessing change from baseline (e.g., mixed model for repeated measures), rather than in progressor or time-to-event analyses.1,2 Perhaps due to this, there is not much evidence of debate or disagreement over the appropriate threshold for determining within-patient worsening. This study, therefore, adds to the limited literature, providing an alternative methodology to the existing statistical approaches. 8 Anchor-based estimation has also been conducted on the UPDRS-III (the prior version of the MDS-UPDRS Part III); however, greater emphasis was typically placed on estimating thresholds for improvement, or did not differentiate between improvement and worsening,28–30 with the exception of a study conducted in a daily practice setting, which found that 5 points (ON medication state) is a suitable progression threshold. 31 The MDS-UPDRS Part III relies on the judgment of trained administrators to accurately capture the degree of severity of motor signs associated with Parkinson's disease. Whereas anchor-based approaches are data-driven, the modified Delphi panel allows experienced raters to reflect on the entirety of their extensive clinical experience with evaluating change on the MDS-UPDRS Part III. Furthermore, through consensus, this is not based on the experience of one expert clinician but on group judgment. Such an approach provides a complementary method (to the traditional anchor-based methods) for assessing a meaningful motor progression threshold. Furthermore, given inherent challenges with meaningful within-patient change estimation, convergence across multiple sources lends greater weight to a particular estimate.
During the review of this research, two main themes emerged that warrant further discussion. The first is the issue of time, and specifically, the issue with confusing how time impacts the methodological principles of estimation versus how it impacts the practical principles of application. The methodology applied in this study attempts to answer the question ‘what is a meaningful worsening in MDS-UPDRS Part III score?’ and not ‘what is a meaningful worsening in MDS-UPDRS Part III score over X months?’. Essentially, the threshold is a practical estimate that denotes that ‘State B’ is meaningfully different from ‘State A’, acknowledging that ‘State A’ comprises all possible baseline states (i.e., that an individual's condition has meaningfully deteriorated to the point that the current state is meaningfully different from the baseline state). This principle is agnostic of time but the estimation methodology is impacted by it. Indeed, we observe a difference in the median values at Weeks 24 and 52, with a lower score at the earlier visit. As the methodological principle is agnostic of time, it would be erroneous to conclude that such values should be applied to corresponding length studies (i.e., 4 points for 24 weeks and 6 points for 52 weeks). Instead, we should consider why different estimates are produced. We use all patients at these visits within the anchor category of ‘Minimally worse’ but they may have been rated this way at prior visits. Given the progressive nature of the disease, their MDS-UPDRS Part III scores may continue to progress since this earlier visit but the condition has not worsened to the degree of being rated as ‘Much worse’. Thus, such values could lead to overestimation of the minimal threshold. Additionally, measurement error on the two measures can contribute to fluctuations in the relationship between the two scales over time. Whilst time is a methodological challenge for estimation, it is a fundamental principle for application. For example, applying such a threshold to short-term follow-up (e.g., 6 months) is unlikely to be useful, as relatively few participants will meet the progression threshold. To answer this question, alternative methodologies should be applied. Likewise, applying the threshold over long-term follow-up (e.g., >5 years) may also be problematic, given most participants will meet the threshold, and thus a treatment effect is unlikely to be observable. Here, it may be more informative to identify a threshold of a more substantial worsening (i.e., corresponding to ‘Much worse’ or ‘Very much worse’ on the anchor) rather than minimal (depending on the aims of the study). The methodology used in this study is not appropriate to answer what threshold would be important for assessing a rapid decline versus a slow decline. Caution is required with the application of such thresholds to ensure they are used appropriately. Thus, when applying the threshold recommended in this study, the duration of the analysis period is critical. For example, the PADOVA study (NCT04777331), where the primary endpoint is time-to-confirmed-motor progression (≥5 points worsening in MDS-UPDRS Part III), has a blinded follow-up duration of at least 18 months. This study duration was evaluated to be suitable for identifying the required number of events for the studied population. This process of determining the appropriate clinical trial study duration (where the threshold will be applied) based on the threshold (i.e., ensuring some but not all patients will likely meet it), rather than first determining a duration and then having to estimate a threshold that would be appropriate within a potentially narrow context, is critical to ensure the meaningfulness of the results are interpretable for the broader population. For example, if we were instead attempting to estimate a threshold for rapid decline (e.g., for a two month study duration), an achievable threshold would either likely be small enough to fall within measurement error (e.g., below one standard error of measurement), thus limiting the robustness of the results, or would require an optimized population of faster progressors to meet the threshold, thus limiting the ability to extrapolate the findings to a broader population.
The second theme that emerged is the appropriateness of pooling data across treatment arms for the purpose of meaningful within-patient change estimation. Such an approach is conceptually appropriate given: (1) even if effective, some patients receiving prasinezumab are still expected to experience meaningful motor progression and thus contribute to the estimation; (2) the intent is to provide an estimate for use in a clinical trial endpoint; and, critically, (3) that trial endpoints should evaluate treatment arms against the same criteria. Attempting to apply different thresholds in an a priori endpoint to different treatment arms is fraught with challenges, not least the assumption (rather than the tested hypothesis) that those receiving the potential therapy will respond differently to the control. However, there is evidence that the calculation of estimates may produce different values for treatment arms.28,32,33 For example, Pahwa et al. (2022) 33 identified a higher threshold on their target measure in treated patients compared to those receiving placebo. They hypothesize that treated “patients may improve to such a degree that there is a reduced ability to pinpoint a minimally important change”. 33 It is also worth noting that the correlation between anchor and target measures was stronger in the treated patients than those receiving placebo, which can impact the estimate produced. Regardless of the reason, such findings highlight a challenge with the estimation of such thresholds when the intent is to apply a single threshold in future clinical trials. This further highlights the importance of estimation across multiple studies to provide more information from which to triangulate a single threshold.
There were some limitations of the anchor-based estimation of the motor progression threshold. Ideally, a patient-reported anchor would have been used, as it provides a direct experience from which to establish meaningfulness of change. Clinician-report (via the CGI-I) may not accurately reflect a meaningful change from the perspective of the patient. Unfortunately, the correlation with PGI-C was below an acceptable threshold but future analyses with more suitable patient-reported anchors would provide stronger evidence in support of a meaningful progression threshold. Although the correlations between MDS-UPDRS Part III and CGI-I were all above the pre-specified threshold, they were at the lower end of acceptability. This has implications for the estimation of the progression threshold. For example, the use of the ‘Minimally worse’ category to define the cohort from which our estimate was derived could be questioned. However, it is important to consider a number of factors in determining the appropriateness of an anchor, and the category (or categories) selected. For example, the separation in empirical cumulative distribution function curves and the non-overlapping confidence intervals for the ‘No change’ and ‘Minimally worse’ categories are supportive of its use. Furthermore, across a broader population, Horvath et al. (2015) 8 identified a correlation of change of 0.71 between the MDS-UPDRS Part III and the CGI-I. Thus, it is likely that the correlation observed in our study is a consequence of ‘restricted range’ (e.g., see Bland & Altman 2011). 34 Indeed, the majority of individuals were either ‘No change’ or ‘Minimally worse’ (e.g., 78.41% at Week 52). Had additional suitable anchors been available, consistency across estimates could help to mitigate such concerns and provide greater support for the proposed threshold.20,22 As well as the use of multiple anchors, the use of multiple datasets would also be beneficial in establishing a suitable threshold. Convergence across datasets would provide greater confidence in an estimate, and may allay concerns with, for example, potential placebo effects, or other factors which could potentially impact estimation.
For the modified Delphi panel, there are three main limitations with the current research. First, whilst the intention of providing information prior to Section B of Round 2 was to standardize awareness of the anchor-based estimation research (potentially justified, given Horváth et al. was referenced in the free text responses in Round 1), it could have been a primary factor in the level of agreement provided to questions in Section B. Of note, however, is that the proportion of panelists in agreement with the range of 4–6 points was the same in Sections A and B (92%). Second, only two rounds were conducted. A third round, to potentially demonstrate that the consensus was maintained, would have added extra weight to these findings. Third, the research was conducted as an anonymous survey rather than permitting panelists to openly discuss their opinions. This was done to avoid biasing opinions towards those offered by more established panelists, with a summary of Round 1 responses provided before Section A in Round 2, in order to convey the perspectives offered in Round 1. However, we cannot exclude that some panelists may have become aware of the participation of other panelists via interactions outside of this research study. Whilst not necessarily a limitation, the authors chose to use a 4-point scale rather than using a 5-point scale with the addition of a middle category of ‘Neither agree nor disagree’. It is possible that the introduction of this category might decrease agreement rates. The potential inclusion of this category was discussed; however, it was not deemed relevant, as disagreement can include not being convinced that there is sufficient evidence to support the position.
There are some caveats to the use of this motor progression threshold. First, it is not recommended to use this threshold to directly interpret the magnitude of the between-groups difference from a treatment comparison (e.g., analyzed using mixed model for repeated measures or analysis of covariance). Whilst differences above this value would certainly be meaningful (i.e., the average patient's outcome on treatment is meaningfully better than the average patient's outcome on the control arm), it should not be considered the minimally meaningful between-group difference required, given slowing decline in a progressive neurodegenerative illness is a desirable outcome of treatment (i.e., smaller between-group differences may still be meaningful). Second, caution should be taken when using this threshold for individual patient care. When used in a treatment comparison (e.g., differences in time to progression, or in the proportion of patients at a given follow-up visit), any measurement error in the definition of progression is assumed to be balanced between treatment arms. However, for a single patient, it is possible that the observed change in score does not reflect a true change. Thus, the motor progression threshold estimated in this study is recommended for use when aggregating data, for a time-to-progression or progressor analysis (i.e., proportion of motor progressors at a given follow-up visit) in the evaluation of an interventional therapy. Additionally, this estimate should only be used for progression (i.e., should not be considered as suitable for assessing improvement). Given the high degree of overlap between improvement categories, the anchor measures were not suitable for assessing meaningful improvement. Finally, this progression threshold is suitable for an early Parkinson's disease population (i.e., Hoehn and Yahr stages 1‒2), and to be applied to data in OFF medication state. For data collected in ON medication state, and/or for more severe populations, additional estimation (or alternative existing evidence, e.g., Horváth et al. 2015) 8 should be explored.
The results of this study indicate that the motor progression threshold for MDS-UPDRS Part III (OFF medication state) likely falls in the range of 4‒6 points, with the mean change-informed estimates converging on 5 points. Taking into account the clinical consensus from the modified Delphi study, a motor progression threshold of 5 points is recommended. Such an estimate is suitable for use to define a motor progression event for a time-to-progression endpoint in an early Parkinson's disease population.
Footnotes
Acknowledgements
The authors would like to thank the participants and site staff involved in the PASADENA study and the members of the modified Delphi panel for their participation. Editorial assistance for this manuscript was provided by mXm Medical Communications funded by F. Hoffmann-La Roche Ltd
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors declare that F. Hoffmann-La Roche Ltd was the sponsor and sole funder of the study. F. Hoffmann-La Roche Ltd was involved in the study design, collection, analysis, interpretation of data, the writing of this article, and the decision to submit it for publication.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: DT is an employee of Roche Products Ltd. and shareholder of F. Hoffmann-La Roche Ltd. EWD is an employee of F. Hoffmann-La Roche Ltd. EM is an employee of Roche Products Ltd. AM, GP, NP, NS, and TN are employees and shareholders of F. Hoffmann-La Roche Ltd. SZ was an employee of F. Hoffmann-La Roche Ltd at the time the work was conducted. LB, RR, and SC are employees of Modus Outcomes Ltd.
Data availability
The datasets presented in this study can be found in online repositories. Qualified researchers may request access to individual patient-level data through the clinical study data request platform (https://vivli.org/). Further details on Roche's criteria for eligible studies are available here (https://vivli.org/members/ourmembers/). For further details on Roche's Global Policy on the sharing of clinical information and how to request access to related clinical study documents, see here (
).
