Abstract
Background:
Freezing of gait is a highly disabling symptom in persons with Parkinson’s disease (PwP). Despite its episodic character, freezing can be reliably evaluated using the FOG score. The description of the minimal clinically relevant change is a requirement for a meaningful interpretation of its results.
Objective:
To determine the minimal clinically relevant change of the FOG score.
Methods:
We evaluated video recordings of a standardized freezing-evoking gait parkour, i.e., the FOG score just before and 30 minutes after the intake of a regular levodopa dose in a randomized blinded fashion. The minimal clinically relevant response was considered a value of one or more on a 7-step Likert-type response scale [–3; +3] that served as the anchor. The minimal clinically relevant change was determined by ROC analysis.
Results:
37 PwP (Hoehn & Yahr stages 2.5–4, 27 male, 10 female) were aged 68.2 years on average (range 45–80). Mean disease duration was 12.9 years (2–29 years). Minimum FOG score was 0 and Maximum FOG score was 29. Mean FOG scores before medication were 10.6, and 11.1 after medication intake, with changes ranging from –14.7 to +16.7. The minimal clinically relevant change (MCRC) for improvement based on expert clinician rating was three scale points with a sensitivity of 0.67 and a specificity of 0.96.
Conclusions:
The FOG score is recognized as a useful clinical instrument for the evaluation of freezing in the clinical setting. Knowledge of the MCRC should help to define responses to interventions that are discernible and meaningful to the expert physician and to the patient.
INTRODUCTION
Freezing of gait (FOG) is a highly relevant clinical problem in the management of patients with Parkinson’s disease (PwP) [1, 2]. FOG contributes to falls [3], predicts the development of cognitive decline [4, 5], and determines quality of life [6].
Despite its clinical significance, the assessment of FOG is only insufficiently investigated [1, 8]. As the clinical evaluation of this evasive and episodic disorder is technically challenging, most larger studies rely on patient reported outcome measures, such as the FOG questionnaire [9]. However, this instrument, and also its revision, the new FOG questionnaire [10], are unable to assess short-term changes of gait behavior as introduced by fast acting compounds or deep brain stimulation, as they ask the patient to describe their experience during the last week. Smaller mono-center research trials usually have the investigators record the patients on video, and count episodes, or determine the time-spent-frozen [11–13]. However, it has been established that there is an astonishingly large spectrum of how researchers perceive and define FOG, leading to a high variance in results of video analyses [14].
For these reasons our group started a new approach to evaluate the severity of FOG [15]. We developed a scoring system based on the hypothesis that festination and freezing, and their intermediate neighbor, i.e., trembling-in-place, are manifestations of a gait dysfunction spectrum which has been termed gait ignition failure [16]. The notion that festination has its firm place in phenomenology of patients with PD is rooted in historical accounts, as Parkinson, Charcot, Buzzard, Wilson and other careful observers of PwP predominantly report festination, and falls, e.g., [17]. Furthermore, recent research has provided ample experimental evidence that festination in the form of a sequence effect quite often precedes freezing episodes [18].
The FOG score has patients walk a simple parkour with 12 varied situation and tasks. A rater will assign a score of one for festinating gait, a score of two for trembling-in-place or akinetic gait, and a score of three when the situation/task is abolished or if a cue is needed to overcome the motor block. The FOG Score has been used in numerous studies not only by our own group, but also in the work of others [19, 20]. In 2016, the Movement Disorders Scale commission suggested its use for further research [7].
The minimal clinically relevant change (MCRC) is a statistical construct that allows for easy interpretation of study results [21]. In the context of FOG this is of high relevance, as there were repeated claims of efficacy for therapeutic strategies based on small but still statistically significant effects. For example, the MAO inhibitor rasagiline was hailed a promising specific treatment based on a single case [22], or a 0.16 difference towards placebo in the UPDRS freezing item [23]. The clinical relevance, however, has been doubted by movement disorders experts who currently do not recommended rasagiline as a specific treatment for freezing [2].
Thus, the MCRC can be interpreted as a scale that is responsive to change that conveys relevant clinical information. Different ways of describing a MCRC have been suggested [21]. For this study we opted for an anchor-based approach using receiver-operating characteristic curve (ROC) analysis to determine the MCRC [24]. ROC analysis provides sensitivity and specificity of any possible value for an MCRC.
In this rater-blinded study, we calculated the MCRC by evaluating the FOG score before and 30 min after a regular L-DOPA dose. The differences of the pre- and the post-dose scores were anchored on a Likert-type scale that allowed for a dichotomization of the therapy response as interpreted by a movement disorder expert, and by the patients themselves. We report the MCRC for improvement as this project and other therapeutic studies target functional gains brought on by interventions.
MATERIALS AND METHODS
Ethical approval and clinical setup
The study was conducted at the Schön Klinik München Schwabing (MSW), Munich, Germany. The MSW is a Parkinson specialty hospital that oversees about 1300 in-patients and 1500 out-patients a year. Prior to any experimentation we obtained the ethical approval to start this diagnostic study from the Ethics Committee of the Technische Universität München (TUM) (August 16, 2017; Az. 176/17). The study was conducted from August 2017 to January 2018.
Patients: Inclusion and exclusion criteria
We investigated in-patients with a diagnosis of PD according to the UK Brain Bank criteria [25]. Further inclusion criteria were regular experience of freezing of gait or festination and motor fluctuations. Exclusion criteria were diagnosis of atypical or vascular parkinsonism, normal pressure hydrocephalus, inability to walk for five minutes, no experience of FOG or Festination during the study and daily levodopa dose less than 300 mg. Cognitive impairment was not considered an exclusion criterion, as there is evidence that FOG correlates with cognitive impairment and executive dysfunction [26, 27], and ruling out cognitively impaired FOG patients might even lead to excluding a specific pathophysiologically and phenomenologically relevant FOG subtype.
Procedure
Screening for patients was performed by history taking, clinical examination and interviewing of caregivers. With inclusion the patients completed the FOG questionnaire [28], and the MoCA [29]. Next, patients were asked to carry out the pre-dose FOG scores that were recorded on video. Afterwards patients took their regular levodopa dose according to their individual therapeutic regime. 30 minutes later patients performed post-dose FOG scores. Immediately after this examination both patient and rater independently evaluated the clinically observed change in freezing on a therapy response scale. The time window of 30 minutes between medication intake and second FOG score was deliberately set to be narrow because we aimed to find minimal changes. We anticipated that complete or impressive remissions of the gait symptoms would not provide us with informative data on the MCRC.
Evaluation instruments
The FOG score has been described in detail in previous publications [15, 30]. For this study the patients were video recorded while performing the FOG score to enable a post-hoc blinded video rating by three raters (uf, ss, kz). FOG scores reported are means from all three raters.
We constructed a 7-step therapy response scale (TRS) to anchor the relevant changes of freezing of gait according to a movement disorder expert and according the patients’ view. The TRS was a Likert-type scale with the levels [–3] dramatic worsening; [–2] moderate worsening; [–1] mild worsening; [0] unchanged; [1] mild improvement; [2] moderate improvement; and [3] dramatic improvement.
For comparison with a patient-reported measure we applied the FOG questionnaire. Cognitive impairment was parameterized by means of the MoCA.
Statistics
Patients are described by age, disease duration, Hoehn and Yahr stage [31], FOG questionnaire and MoCA. Descriptive analyses provide average and standard deviation for normally distributed data, and median and interquartile range if data were not normally distributed. Correlations between FOG score and FOG questionnaire, MoCA, or disease duration were estimated with Spearman’s correlation coefficient Rho. The MCRC was calculated by ROC analysis, using the dichotomized TRS (worsening or no response, 0 vs. mild or better improvement, 1) as event value and the change in the FOG scores as the test value. Optimal MCRC values can be chosen either according to highest sensitivity and specificity, or highest precision, i.e., the highest rate of true positive and true negative results [24]. As our purpose was to maximize the rate of correct clinical decisions we opted to report MCRCs based on precision. All calculations were carried out using XLSTAT 12.0 [32]. The significance level was 5%.
RESULTS
Patient cohort
N = 40 patients fulfilling all inclusion and exclusion criteria were recruited. Three patients could not finish the trial, in one case due to painful spinal claudication and in two cases due to extreme bradykinesia and freezing, unexpectedly rendering the patients unable to walk during the examination, so N = 37 data sets entered our calculations, and are reported in Table 1.
Cohort characteristics
Clinical observations during assessment
During the process of data acquisition some notable observations were made. We were surprised to see that almost half of the subjects (N = 18) had worsened 30 minutes post-medication, so we had to modify our assumption that the L-DOPA effect on freezing would regularly occur within the first 30 minutes post dosing.
We further saw that many patients were unable to discern the freezing disorder and other symptoms of PD. We felt, that especially the discrimination between FOG and bradykinesia posed a problem, the more so in patients who experienced motor blocks infrequently.
A third observation was a relevant discrepancy in the evaluations of patient and expert. A significant portion of the patients reported improvement when worsening of FOG scores occurred.
Results from pre-post clinical assessments
Patients scored a mean 10.6±7.4 pre-dose, and 11.1±7.5 post-dose in the FOG-scores. 30 minutes after intake of medication, we observed improvements (N = 16) and worsening of FOG scores (N = 18). Also see Fig. 1.

Absolute FOG score changes.
The movement disorder expert evaluated the therapy response with a mean of –0.2 (SD 1.1). The patients saw therapy responses with a mean of 0.1 (SD 1.4). The TRS of patients and the expert correlated moderately (ρ= 0.49, p = 0.002). The dichotomized TRS that was used as the anchor was aligned between patients and the expert (Cohen’s κ= 0.43).
The intraclass correlation coefficient between all three raters was ICC = 0.943 (CI 0.915–0.962). Spearman correlation between mean FOG score and FOG questionnaire was ρ= 0.268 (p = 0.11).
To further explore the congruency of the expert’s and the patients’ response assessment we investigated the change of the FOG score in relation to the TRSEXP and the TRSPAT. We found that higher changes in the FOG score were associated with stronger congruent responses within the expert’s (ρ= 0.58, p = 0.0002) and still significant but less congruent with the patients’ observation (ρ= 0.38, p = 0.02). See Fig. 2.

Spearman correlations of the FOG score’s change with the TRS of the expert and the patients.
In four cases there was significant disagreement between the patient’s TRS and the FOG score (e.g., reporting moderate improvement when the FOG score showed worsening). This kind of disagreement only occurred in one of the expert’s ratings.
Improvement and worsening through the eyes of expert and patient
The MCRC was calculated for improvement and worsening separately, using both the expert’s and the patients’ dichotomized TRS as event condition. Using the expert’s evaluation, the MCRC for improvement was determined to be 3, as this MCRC provided for a high specificity (0.96) and reasonable sensitivity (0.67). See Fig. 3. The MCRC for worsening was calculated as 5 (spec = 0.96; sens = 0.46). The results for the patients’ evaluations as well as for clinical worsening are depicted in Table 2.

Left: Sensitivity and Specificity for various changes of the FOG score. Right: ROC curve (true positive rate vs. false positive rate).
MRCR for various constellations
Note that the experiment was not designed to delineate worsening of scores. Also note that patients’ MCRC comes with lower sensitivity compared to the expert’s rating.
DISCUSSION
The report of the MCRC further validates the FOG score as a clinical instrument to assess the severity of this gait disorder. Knowledge of the MCRC allows a movement disorder expert to come to a clinically sound judgement from a short clinical test and from his own observations. Thus, it allows the measure and comparison of pharmacotherapeutic interventions, of deep brain stimulation, or other therapeutic approaches.
We applied the ROC analysis as this would allow us to calculate clinimetric information and has been recorded for other scales in previous research [21, 33]. For all MCRC cut-off values we found high specificity but only moderate sensitivity for prediction of outcome. For practical aspects this means that the positive prediction comes with a higher probability of the true result compared to the negative predictions. This seemed to us the more relevant clinical question. We highlight the MCRC for improvement as most therapeutic studies focus on the benefit brought on by interventions, so we consider an improvement of three points the clinical benchmark for a positive effect.
The FOG score fills an important gap as in many projects FOG has only been evaluated from subjective patient reports, despite the recommendation of combining questionnaires with examiner based tests [34]. While these patient reported measures have their undisputed merits, the specific cohort of PwP with FOG is prone to cognitive problems [27, 35] and inadvertently introduces error into the evaluation of their disorder. Furthermore even adding an instructional video about FOG did not improve the sensitivity and specificity of FOG assessment by non-demented patients using the new-FOG-Q [10].
In our patients, the FOG questionnaire did not correlate significantly with objective measurements of FOG. This result is in line with previous findings of Shine et al., where frequency and duration of FOG episodes reported by patients using the FOG-Q and the New-FOG-Q did not correlate with observer based video analyses [36]. In contrast, the FOG score showed a strong inter-rater reliability, which might result from the clear evaluation criteria based on the FOG phenomenology rather than the evaluation of frequency and duration of episodes [15]. Nevertheless, this finding must be interpreted cautiously, because here we are comparing two FOG measures that cover completely different stretches of time, with the FOG score covering a span of five minutes and the FOG questionnaire covering a whole week. This bias is important in any comparison of FOG score and FOG questionnaire but even more so in our study since many of our patients had undergone drastic medication changes in the week before.
We also observed four marked disagreements between the patient-reported outcome and the objective FOG scores. Three of these patients were cognitively impaired (MoCA scores 24, 24, 26 and 30, mean 26), but no more so than the overall study population (mean MoCA 25.7). We feel that some patients had trouble distinguishing between bradykinesia and FOG, which may have caused some misinterpretations in the Therapy Response scale. Maybe the various subtypes of FOG presented with and without leg movements were responsible for the difficulties of patients to differentiate FOG from off state akinesia [10]. Patient opinion-based MCRCs had lower but acceptable sensitivity, specificity, AUC and precision compared to expert rater based MCRCs. From this we conclude that patients can be trusted on their judgement of freezing.
We observed that half of the patients experienced worse FOG 30 minutes after levodopa ingestion then before. We deliberately did not withdraw our patients from levodopa and then administer a high dose to them because that would have yielded dramatic improvements rather than the minimal changes that we wanted to observe. Also, we wanted to see worsening of symptoms in some patients – otherwise we would not have been able to report MCRCs for worsening.
Still, we would have anticipated that a larger number of patients would be improved after 30 minutes, keeping in mind that pharmacological studies usually report 15–30 minutes until symptom remission [37]. We believe that the main reason for this may have been that the patients we included were only hospitalized patients, many of whom were treated with insufficient levodopa doses at the time the examinations were performed. Waiting until they were sufficiently treated may have resulted in total remission of FOG or in the dramatic improvement mentioned before. Another factor may have been the phenomenon of beginning-of-dose motor deterioration which is a worsening of motor symptoms 10–20 minutes after levodopa intake lasting for about 10–20 minutes [38]. We feel that it would have been wise to include a 60-minute post-dose FOG score rating, because this could have provided us with insight on the number of patients with levodopa-induced FOG in our population. Levodopa-induced FOG is a rare entity [39] and can certainly not account for more than a few patients’ deterioration. However, the data had already been collected before the observations on the FOG score changes were made.
The MCRC of three points corresponds to previous research: Fietzek et al. reported a change of 11.5 one hour after an increased levodopa dose in a group of freezers [30]. Weiss et al. found an improvement of 13.5 in patients receiving combined STN and SNr stimulation [19]. Dagan et al. saw an improvement of 4.2 immediately after multi-site transcranial direct current stimulation [20]. Thus, all those interventions clear the benchmark for a minimal clinically relevant intervention, which is an important aspect as a number of research studies on pharmacological interventions in FOG have provided only equivocal results [23].
In order to improve the implementation of the recommended combined FOG evaluation consisting of PROMs and investigator-based tests [34], training videos of the FOG score could make an important contribution in the future. In a web-based approach, video examples could be useful for health professionals, physicians and researchers to assess the severity of the different leg movements according to the FOG score manual.
This report does not conclude the quest to solve the clinical puzzle of the evaluation of FOG. One of the underlying hypotheses for the construction of the FOG score was the pathophysiological proximity of festination and freezing. Such a concept is supported by a number of observations, among which are the description of the sequence effect [18, 40], the loss of symmetry and automaticity [41, 42] or EMG pattern analysis [43]. This leads to the critical issues of the definition of FOG [1, 44]. Recent research has strongly focused on the freezing episode, while far less research effort has been put into festination [45]. We would argue that there is much insight to be gained from the evaluation of a gait ignition disorder, as has been suggested by early thinkers on this peculiar gait phenomenon [16].
Ideally, festination and freezing should be measured in an objective fashion from body-worn sensors [46, 47]. As single freezing episodes are virtually indistinguishable from voluntary stops, index values or statistical marker might be a better way for the objective measurement compared to the counting of freezing episode numbers. A cross-validation study for the FOG score using objective markers of the gait disorder is an unmet need.
CONFLICT OF INTEREST
The authors declare that there is no conflict of interest to report.
Footnotes
ACKNOWLEDGMENTS
We would like to thank the patients for their time and efforts to make this study possible. We are grateful to the Deutsche Stiftung Neurologie (DSN) and the Deutsche Parkinson Vereinigung (DPV) for their support in carrying out this project.
