Abstract
Background:
Previous reviews indicated positive effects of resistance training (RT) on motor outcomes in Parkinson’s disease (PD). However, inconsistencies between the included studies exist, and non-motor outcomes have only scarcely been considered in a review on RT in PD.
Objective:
To analyze the RT effects on motor- and non-motor outcomes in PD patients compared to passive and physically active control groups (i.e., other structured physical interventions).
Methods:
We searched CENTRAL, MEDLINE, EMBASE, and CINAHL for randomized controlled trials of RT in PD. After identifying 18 studies, a meta-analysis was conducted for the outcomes muscle strength, motor impairment, freezing of gait (FoG), mobility and balance, quality of life (QoL), depression, cognition, and adverse events. Meta-analyses with random models were calculated using mean differences (MD) or standardized mean differences (SMD) with 95% confidence intervals (CI).
Results:
When comparing RT with passive control groups, the meta-analyses showed significant large effects on muscle strength (SMD = –0.84, 95% CI –1.29––0.39, p = 0.0003), motor impairment (SMD = –0.81, 95% CI –1.34––0.27, p = 0.003), mobility and balance (MD = –1.81, 95% CI –3.13––0.49, p = 0.007), and small significant effects on QoL (SMD = –0.48, 95% CI –0.86––0.10, p = 0.01). RT compared with physically active control groups reached no significant results for any outcome.
Conclusions:
RT improves muscle strength, motor impairment, mobility and balance, QoL, and depression in PD patients. However, it is not superior to other physically active interventions. Therefore, exercise is important for PD patients but according to this analysis, its type is of secondary interest.
INTRODUCTION
Parkinson’s disease (PD) is the second-most common neurodegenerative disease with a rapid increase of incidence and prevalence throughout the world [1]. The neuropathology of PD is primarily attributed to degeneration of dopaminergic neurons in the substantia nigra, but further neurotransmitter systems are involved. This leads to typical progressive motor symptoms such as bradykinesia, rigidity, tremor, and postural instability [2]. Next to motor symptoms, clinical symptoms of PD are also characterized by non-motor symptoms [3], including cognitive impairment, depression, and autonomic dysfunction.
The therapy of PD is primarily based on pharmacological and surgical interventions. However, these treatment options are not curative, and PD remains a progressive disease with a high risk of severe motor and non-motor disabilities significantly decreasing quality of life (QoL) [1]. For this reason, physical interventions have gained attention and are recommended as an effective adjunct to drug therapy in managing motor (e.g., muscle strength, aerobic capacity, balance, gait and mobility) and non-motor symptoms (e.g., QoL, depression, and cognition) [3]. Moreover, PD patients exhibit lower maximal strength levels compared to healthy controls [4]. Therefore, resistance training (RT), which is the main exercise intervention for gaining muscle strength, is an essential component in PD rehabilitation to counteract muscle deficits, falls and to improve functional capacity and activities of daily living (ADLs) [5].
Numerous randomized controlled trials (RCTs) have underlined the importance of RT and have demonstrated that RT is suitable for improving motor dysfunction [6], mobility [7], strength [8], balance [9], depression, and QoL [10] in PD patients. However, there is only one recent systematic review and meta-analysis from 2020 focusing solely on leg strength, balance, gait performance, and QoL, which improved significantly [11]. Other previous systematic reviews and meta-analyses from 2015 and 2016 [2, 13] included only few studies, the results were inconsistent, and effects of RT on non-motor outcomes were neglected. Furthermore, prior meta-analyses did not divide the control group into either another physically active control group or a passive control group, except for Roeder et al. [2], which is important to examine the specificity of the results. This analysis indicated that RT leads to significantly greater strength compared to passive control groups but not to physically active control groups. Therefore, it is crucial to separate the control groups into either passive or physically active control groups. As a final aspect, adverse events (AEs) occurring in the course of RT and possible sex differences have been disregarded so far, though there is evidence that females and males respond differently to the same RT program. Possible reasons are differences in hormonal and pre-training levels, occupation, as well as daily physical activities [14].
Therefore, the aim of this systematic review and meta-analysis was to evaluate the effects of RT on motor (i.e., muscle strength, motor impairments, freezing of gait (FoG), mobility and balance) and non-motor outcomes (i.e., QoL, depression, and cognition) compared to passive as well as physically active control groups in PD. In addition, sex differences and safety of RT in PD patients were examined. Based on the findings from previous studies and meta-analyses [2, 15], we hypothesized that RT improves strength, motor impairment, FoG, mobility and balance, QoL, depression, and cognition in contrast to passive, but not to physically active control groups, in PD patients.
MATERIALS AND METHODS
This systematic review was pre-registered in the international database of prospectively registered systematic reviews in health and social care (PROSPERO; CRD42021242325) and adheres to the PRISMA guidelines for reporting systematic reviews and meta-analyses [16]. No review protocol was published.
This study was a follow-up to a Cochrane systematic review and network meta-analysis (see Roheger et al. [17] for the protocol) [17]. This Cochrane review is currently in process and compares the effects of different physical interventions for PD patients. In this follow-up research project, a systematic review and meta-analysis was conducted. The distinction is that this follow-up review focused on RT for PD patients only, whereas the Cochrane Review includes different physical interventions which were compared fusing a network meta-analysis. Furthermore, this follow-up review included additional outcomes, such as muscle strength, depression, and cognition as well as the consideration of possible sex differences.
Search strategy
The underlying search strategy was based on Roheger et al. (2021). The following databases and data sources were searched until May 2021: CENTRAL, MEDLINE, EMBASE, CINAHL, SPORTDiscus, AMED, REHABDATA, PeDRO, EU clinical trials register, World Health Organisation, Clinicaltrials.gov, and ISRCTN. No language restrictions were applied. As an example, the specific search strategy for the MEDLINE database is described in Supplementary Table 1. Additionally, a manual search in previously published systematic reviews and meta-analyses examining RT in PD was conducted [2, 12].
Eligibility criteria
Both full-text and abstract articles were eligible if sufficient information on study design, patient characteristics and interventions was available.
Interventions, study design and comparators
The definition of RT was adapted to a taxonomy for categorizing exercise interventions [18] and covered all types of weight training inducing muscle contractions and an overload of the skeletal muscle. RT may have consisted of various training contents, settings, and training devices (e.g., strength machines, free weights, body weight, elastic resistance bands, sling exercises) and may have been delivered individually or in a group-setting. At least 50% of the number of interventions had to be supervised.
Only RCTs evaluating effects of RT against a passive or placebo control group or another physically active control group in PD patients were included. Passive control groups were defined as no treatment, usual care, or wait-list control, which do not impact the participant’s habitual routine. Placebo control groups included interventions which conducted either sham programs, such as low-intensity exercises with an insufficient training stimulus (e.g., RT intensity <20% one-repetition maximum (1RM) [19], or low-impact stretching [20]), or sessions without physical activity but an educational program (e.g., health information sessions). Physically active control groups include other types of structured and supervised physically active interventions, which were expected to induce an adequate training stimulus (e.g., balance training, endurance, aqua jogging). Trials that examined combined exercise interventions (e.g., RT and balance training), with RT being the primary component of the intervention, were also eligible. Studies that included non-physically active control groups (i.e., cognitive training) would have been included, but were not identified.
Participants
Studies investigating adult patients (≥18 years) with no restriction regarding sex or educational level and a clinical diagnosis of idiopathic PD from stage I–IV on the Hoehn and Yahr (H&Y) scale [21] were included. Patients without or with cognitive impairment were eligible for this study. Trials including subjects with atypical parkinsonism were excluded.
Outcomes
The primary outcome measure was muscle strength assessed by 1RM test [22] or isokinetic strength test [23]. The secondary outcome measures were motor impairment (e.g., measured with the motor section of the Unified Parkinson’s Disease Rating Scale (UPDRS-M) [24]), FoG (e.g., measured with the Freezing of Gait Questionnaire (FOG-Q) [25])), mobility and balance (e.g., measured with the Timed Up & Go Test (TUG) [26]), QoL (e.g., measured with the Parkinson’s Disease Questionnaire 39 (PDQ-39) [27]), depression (e.g., measured with the Beck Depression Inventory (BDI) [28]), cognition (e.g., measured with the Montreal Cognitive Assessment (MoCA) [29]), and AEs (documented at any time after patients’ randomization). For all outcomes, only short-term effects were considered, i.e., assessments conducted shortly after the end of intervention (≤six weeks post-intervention), due to limited data and a high heterogeneity between the timing of follow-up assessments.
Study selection
One review author (M.E.) initially screened and removed titles that did not meet the inclusion criteria. Afterwards, two reviewers (M.E. & A.K.F.) independently performed the screening of the abstracts and reviewed the full-text articles for eligibility. Full-text and abstract articles of eligible studies were obtained and evaluated. Furthermore, two authors (R.G. & E.L.) independently extracted all relevant data using a standardized data extraction form and cross-checked all the information afterwards. In the case of discrepancies, the authors discussed or consulted another author (A.K.F.) until consensus was reached. If further information was required, authors of corresponding studies were contacted. If authors did not reply to the request, they were contacted again once after ten days. The extracted data consisted of general study information (e.g., author, publication date), study characteristics (e.g., trial design, setting, power calculations, randomization process), patient characteristics (e.g., baseline demographic characteristic, number of patients recruited/allocated/evaluated), information about the intervention (e.g., type, frequency, length, session duration), outcomes (e.g., tools, timing of assessment), and further notes (e.g., funding, conflict of interest).
Risk of bias
Methodological quality was analyzed independently by two authors (R.G. & A.K.F.) using the Cochrane Collaboration’s tool for assessing risk of bias (RoB) in RCTs [30]. Thereby, six domains of bias were analyzed for each study: selection bias, which consisted of the judgement of a random sequence generation and allocation concealment. Performance bias included the rating of blinding of participants and personnel. Due to the nature of the included interventions, blinding of participants and personnel was not possible and therefore rated as high RoB in all trials. Blinding of outcome assessment was part of the detection bias domain. The domain of attrition bias included the judgement of incomplete outcome data. Reporting bias was another domain and described selective outcome reporting. The last domain included all other bias, that was not covered in the prior domains. The judgement for each domain was classified into low, high, or unclear RoB. For the appraisal of the overall RoB, studies were judged as high RoB if two or more RoB domains were rated as high.
Data analysis
RevMan 5.4 for Windows software (Cochrane Collaboration, London, UK) was utilized to conduct the meta-analyses. A requirement for calculating meta-analyses was that at least two appropriate studies were available. If there was missing data in studies, the authors were contacted to request all relevant missing data. This was the case for post-intervention means and SDs differentiated for women and men in each study. Of the 17 authors contacted, nine responded. Four authors provided the requested data [9, 31–33], whereas the remaining authors were not able to obtain the data.
Finally, only the outcomes muscle strength, motor impairment, FoG, mobility and balance, QoL, and depression were included in separate meta-analyses. Due to limited data, cognition was not included in the meta-analysis. Meta-analyses differentiated for passive and placebo, respectively, as well as physically active control groups of all included RCTs were conducted. Trial arms from studies with multiple treatment groups were combined, as long as they were classified as subtypes of the same exercise intervention (cf. the adapted taxonomy of Lamb et al. [18]). Furthermore, subgroup analyses for sex differences concerning muscle strength and FoG were conducted via RevMan 5.4. No subgroup-analyses were conducted for the other outcomes, because only limited data was available.
For the statistical analyses, mean differences (MDs) and 95% confidence intervals (CIs) were calculated for continuous outcomes operationalized with the same instrument. Otherwise, standardized mean differences (SMD) and 95% CIs were computed to compare effect measures of the intervention group with passive and physically active control groups. Effects from 0.2 to less than 0.5 were defined as small, effects from 0.5 to less than 0.8 as medium and effects from 0.8 and higher as large [34]. An alpha level of 0.05 was determined for all statistical tests. The post-intervention mean value or mean change from baseline, the standard deviation (SD) and the number of evaluated participants (n) of the outcome measurements in each intervention group were used for meta-analyses. If only standard errors (SE) or CIs were reported in the study, the corresponding SD was obtained as recommended in the Cochrane Handbook for Systematic Reviews of Interventions [35]. In studies with intention-to-treat (ITT) and per-protocol analyses, only the ITT data were used in the meta-analysis. In trials reporting no ITT data, the per-protocol data was utilized. Furthermore, whenever the number of patients evaluated for a specific outcome measurement was not presented, the number of patients randomized per treatment arm was used.
For the assessment of statistical heterogeneity and inconsistency the p-value from the χ2 test, the generalized I2-statistic and Tau2 were used (I2 0% –40% not important/low heterogeneity; 30% –60% moderate heterogeneity; 50–90% substantial heterogeneity; 75% –100% considerable heterogeneity) [36]. A random effects model was used because some variation of the intervention effects was expected. Forest plots for each outcome measurement depicted the estimated treatment effect graphically and the z-value and p-value numerically. For meta-analyses with at least nine trials, funnel plots were generated to graphically identify possible publication bias. Sensitivity analyses were conducted to assess the robustness of the results by excluding studies with a high RoB (≥2 high RoB domains). Additionally, fixed effects models were evaluated as part of the sensitivity analysis.
RESULTS
21,965 records were identified through the database search, and another 19 records were found through previously published reviews and meta-analyses. After screening 19,412 records, 16,116 records were initially excluded, and 48 full-text articles were assessed for eligibility. 19 articles were excluded and a total of 18 studies, which were published within 29 articles, were included in this systematic review. For the meta-analysis, 17 trials, published within 28 articles, were included (Fig. 1).

Flow diagram of the study selecting process and results.
Systematic review: Study characteristics
An overview of the general characteristics of the included studies is presented in Table 1. A total of 1,134 PD patients were comprised in the 18 included studies. Of these, 481 participants were randomized to the experimental group, 302 people took part in a passive or placebo control group and 351 were randomized to another physically active control group.
Main characteristics of included studies
∘Positive trend; • = significant time effect; * = significant time x group interaction. n, number of participants; F, female; M, male; SD, standard deviation; H&Y, Hoehn & Yahr [21]; e.g., for example; min, minutes; NA, not applicable; NR, not recorded; PD, Parkinson’s disease; RT, resistance training; C, control group; RM, repetition maximum; VO2max, maximum oxygen consumption; HRmax, maximum heart rate; reps, repetitions; kg, Kilogram; PDQ-39, Parkinson’s Disease Questionnaire 39 [27]; PDQ-8, Parkinson’s Disease Questionnaire 8 [74]; UPDRS-M, motor section of the Unified Parkinson’s Disease Rating Scale [24]; MoCA, Montreal Cognitive Assessment; Digit Span, Digit Span Test [29]; MDS-UPDRS-M, motor section of the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale [75]; TUG, Timed Up & Go Test [26]; BDI, Beck Depression Inventory [28]; FOG-Q, Freezing of Gait Questionnaire [25]; N-FOG-Q, New Freezing of Gait Questionnaire [76]; HADS, Hospital Anxiety and Depression Scale [77]; HAM-D17, Hamilton Depression Scale [78].
In all studies, patients were eligible with a clinical diagnosis of PD. In addition, most studies included participants who were able to walk independently [6, 37–42] and had a stable use of medication [6, 41–43]. Four studies excluded patients who received DBS [6, 44], 16 studies excluded participants with significant comorbidities (other neurological diseases, cardiovascular disease, hematologic or orthopedic disorders) [6, 45–48], and 14 studies excluded patients with cognitive impairments [6, 48]. The mean age of the study sample was 66 years and ranged from 57.0 [41] to 75.7 years [43]. Overall, 36% of the study participants were women. Participants with different stages of PD severity, measured with the H&Y scale, were included. The lowest H&Y stage was 0 [44] and the highest stage was 4 [39, 44]. Throughout the studies, a mean H&Y stage of 2.3 were included, representing a sample with mostly mild symptoms on both sides. The combined average of the duration of the disease was 7.5 years, ranging from 5.5 [38] to 10.7 [31].
The included studies used different exercise interventions and control groups. Nine studies compared RT to a passive or placebo control group receiving usual care, health information, a stretching program, or low-intensity exercises with insufficient stimuli to achieve a training effect [8, 47]. Six studies compared RT to another physically active control group such as general fitness programs with multiple exercises, yoga or gait/balance training [6, 45]. Three studies compared RT to both a physically active control group (tai chi, gait exercises and yoga) and a placebo control group (stretching and health education) [39, 44].
From the 18 included studies, six studies used only weight machines for the RT program [8, 45], two studies designed their RT as a combination of weight machines and free weights [6, 47], one study combined strength machines and balance exercises during their RT program [40], and one study included weight machines and body-weight exercises [33]. Three studies conducted their RT program with the use of free weights only [32, 39] and three studies used mixed free weights, elastic bands and body-weight exercises [43, 46]. In one study, the RT was performed with the use of body-weight exercises only [15] and one study combined body-weight exercises with balance training [37].
Eight studies were held as group sessions [9, 46], whereas nine studies used an individual approach [6, 47]. One study conducted an individual as well as a group intervention [40].
Although all studies basically described the exercises, only twelve studies defined concrete training parameters including sets, repetitions, and training intensity [6, 47]. The parameters of the RT programs aimed at improvements of strength-endurance (20–80% 1RM; 15–25 repetitions; 1–5 sets) and hypertrophy (70–85% 1RM; 6–12 repetitions; 1–3 sets).
Regarding training frequency, in 14 studies RT was conducted on two non-consecutive days of the week [6, 47], in three studies on three non-consecutive days [37, 46], and in one study the RT contained one training session per week [15]. The total number of training sessions varied between six [37] and 120 sessions [6].
The length of the interventions ranged from seven weeks [43] to two years [6], but in most studies, the interventions were carried out for eight to twelve weeks [8, 44–47].
13 studies reported the duration in minutes of one RT session. In nine studies the duration was approximately 60 minutes [6, 44], in three studies 45–50 minutes [9, 46], and in one study 90 minutes [15]. 14 studies provided data to calculate the total minutes of RT over the full intervention period [6, 46]. The lowest total minutes of RT were 720 minutes [15], the highest total minutes of RT during the intervention period were 15,120 minutes [6] and an average of 2,333 minutes was calculated.
All RT sessions were conducted in an outpatient setting (e.g., ambulant rehabilitation center, physical therapy clinic, university laboratory) except for two trials, which were carried out in a university laboratory [9] or in a gym [6]. Three studies applied an additional home-based RT [15, 44].
Systematic review: Outcomes
The included studies measured numerous outcomes. For this research project the outcomes muscle strength, motor impairment, FoG, mobility and balance, QoL, depression, cognition and AEs were relevant. However, more outcomes such as anxiety [38], fatigue [42], specific gait parameters [6, 42–44], cardiovascular parameters [46, 47], well-being [15], and peak oxygen consumption per unit time
All outcomes were measured at baseline and shortly (within one week) after the intervention was finished. Five studies conducted additional follow-up assessments after four weeks [31], five weeks [43, 46], three months [39], and twelve months post-intervention [44] but were not included in the analyses.
Depending on the outcome, the heterogeneity of the utilized instruments differed. Muscle strength was reported in twelve studies but was assessed with a variety of devices and test procedures (e.g., body position, joint angles, unilateral or bilateral testing, number of sets). Five studies documented muscle strength as 1RM [37, 47], one as relative 1RM [41], one study as maximum voluntary contraction [43] and five studies as peak torque [6, 39]. For the outcome measurements of motor impairment, FoG, mobility and balance, and QoL, the variety of assessments was low: motor impairment was measured in eleven studies, whereby nine studies conducted the UPDRS-M [6, 42–45] and two studies the motor section of the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS-M) [15, 31]. In four studies FoG was measured, whereby three studies used the FOG-Q [31, 43] and one study the New FOG-Q [9]. The outcome mobility and balance was assessed in ten studies by using the TUG [6, 42–44]. In total, eleven studies measured QoL. Nine studies used the PDQ-39 [6, 42–44] and two studies used its short-form, the PDQ-8 [15, 39]. Regarding the outcome measurements of depression and AEs, the utilized instruments differed significantly: Two studies used the BDI [42, 43], one study used the Hospital Anxiety and Depression Scale (HADS) [15] and one study the Hamilton depression scale (HAM-D17) [38] for assessing depression. In terms of the utilized instruments for assessing cognition, two studies [33, 46] used the MoCA as a screening tool for global cognition and one trial [6] the Digit Span Test operationalizing working memory. Although thirteen from 18 studies reported AEs [6, 46], the type and amount of documentation varied considerably (Supplementary Table 2).
The effects of RT on each relevant outcome measure are illustrated in Table 1 and are divided into no effect, a positive trend, a significant time effect (change of scores at post-test compared with pre-test as within-group comparisons) and/or a significant time x group effect (additional change of scores for intervention group compared with control group at post-test as between-group comparisons). From twelve studies that analyzed muscle strength, all studies found improvements except for one [43]. For motor impairment, six studies found significant improvements after RT [6, 42] and five studies reported no significant effects [31, 44]. Concerning FoG, one of four studies found significant positive effects for PD patients after conducting RT [37]. Six out of ten studies reported significant improvements after RT for the outcome mobility and balance [8, 43]. In terms of QoL, six from eleven studies reported significant positive effects after RT [31, 44]. Four studies analyzed the effects of RT on depression and one study documented significant improvements after RT compared to a passive control group [38]. However, it should be mentioned that the other three studies compared RT with another physically active control group. Effects of RT on cognition were assessed by three studies, whereby two found significant improvements on attention, working memory and global cognition [6, 33].
Seven studies reported AEs and gave detailed information about the occurred events [6, 44] and six studies recorded no information regarding AEs [31–33, 42] (Supplementary Table 2). From the studies that provided detailed information, five studies documented the AEs in a narrative form [6, 46] and two studies counted the number of occurred AEs in each trial arm (e.g., number of falls) [39, 44]. Six studies included both serious and non-serious AEs [6, 46] and two studies reported serious AEs only [37, 42]. Regarding the analyzed interventions, ten studies examined AEs in both the intervention and the control groups [6, 46] and one study examined the intervention group only [38]. Most studies did not define the utilized assessments of AEs, except for three trials that used validated instruments (falls risk score; Falls Efficacy Scale-International (FES-I); falls diary) [37, 44]. Although 13 studies reported AEs, only one trial [6] documented serious AEs (bilateral hip replacement, 2 unilateral knee replacements on the same patient, knee surgery to remove old debris, foot surgery, and hospitalization after a fall), which could be possibly related to RT. The other seven studies reported no [31–33, 42] and five studies non-serious (e.g., temporary mild knee pain; muscle soreness; drop in blood pressure) AEs during the intervention [9, 46].
Risk of bias
The evaluation of the RoB of the included studies is depicted in Fig. 2 and Supplementary Figure 1. All studies used an appropriate method to generate a random allocation sequence. However, the allocation concealment was insufficiently described in 44% of the included studies, leading to an unclear RoB and a potential selection bias [8, 40–43]. Due to the nature of the interventions, no blinding of participants and personnel was possible during the study. Therefore, a high risk of performance bias in all included studies was assessed. The blinding of outcome assessors was appropriate in 72% of the included studies. Four studies provided no information about blinding outcome assessments [33, 47] and one study stated that testers were unblinded [40]. This resulted in a possible detection bias. Outcome data was insufficiently reported in six studies: Two studies failed to report sample sizes for the outcomes adequately leading to an unclear RoB [8, 44]. In three studies, the outcome analyses were limited to randomized participants who completed the intervention or the outcome assessments (i.e., per-protocol analyses instead of ITT) [40, 43]. According to the trial record of another study, a sample size of 60 participants was planned to achieve, but the outcome analyses are based on only 35 participants [38]. These studies were rated to be at high risk of attrition bias. Selective reporting or other bias was not detected in any of the included studies.

Risk of bias graph of included studies.
Meta-analysis: RT vs. passive and placebo control groups
The meta-analysis of the effects of RT on maximal muscle strength of the upper and lower extremities against passive and placebo control groups included nine studies and a total of 391 participants (Fig. 3). The SMD was –0.84 (95% CI –1.29 to –0.39), which represents a significant large effect in favor of the RT group (p = 0.0003). A considerable heterogeneity between the studies was detected (I2 = 74%).

Forest plot: Resistance training vs. passive control groups; Outcome: Muscle strength.
For analyzing the effects of RT compared to passive and placebo control groups on motor impairment (assessed by the UPDRS-M and MDS-UPDRS-M), five studies comprising 350 participants were included (Fig. 4). The SMD was –0.81 (95% CI –1.34 to –0.27), which is defined as a significant large effect (p = 0.003) favoring RT. The heterogeneity was considerable (I2 = 78%).

Forest plot: Resistance training vs. passive control groups; Outcome: Motor impairment.
There were three articles, including 111 individuals with PD, that evaluated the effects of RT, in comparison to passive and placebo control groups, on FoG (Fig. 5). The meta-analysis estimated a SMD of –0.31 (95% CI –0.69 to 0.07). Hence, the small RT effect on FoG was not statistically significant (p = 0.11). No heterogeneity was detected between the studies (I2 = 0%).

Forest plot: Resistance training vs. passive control groups; Outcome: Freezing of gait (FoG).
The effects of RT on mobility and balance against passive and placebo control groups was investigated in six studies, including 393 participants (Fig. 6). The analysis showed a MD for the TUG (measured in seconds) of –1.81 (95% CI –3.13 to –0.49) and an overall statistically significant effect (p = 0.007) in favor of the RT group. The heterogeneity between the studies was substantial (I2 = 67%).

Forest plot: Resistance training vs. passive control groups; Outcome: Mobility and balance.
Six studies, including 390 participants, compared the effects of RT with passive and placebo control groups on QoL (Fig. 7). The meta-analysis revealed a SMD of –0.48 (95% CI –0.86 to –0.10), which demonstrates a significant small effect (p = 0.01) favoring RT. A substantial heterogeneity was measured (I2 = 64%).

Forest plot: Resistance training vs. passive control groups; Outcome: Quality of life (QoL).
Meta-analysis: Resistance training vs. physically active control groups
Five studies evaluated the effects of RT on maximal muscle strength of the upper and lower extremities against physically active control groups (Fig. 8). The physically active control groups were divided into two subgroups: A balance, gait, and general fitness program (Balance/Gait/Multi), which primarily aims at balance-, mobility- and functional training, or into Yoga/Tai Chi, representing a body workout with a focus on mental practices. A total of 291 participants were included in the analysis. The SMD was –0.02 (95% CI –0.25 to 0.21), representing no statistically overall test results (p = 0.87). Therefore, no superior RT effect against a physically active control group could be assessed and no subgroup differences could be detected (p = 0.29). Furthermore, no heterogeneity was detected (I2 = 0%).

Forest plot: Resistance training vs. physically active control groups; Outcome: Muscle strength.
For motor impairment, nine studies and a total of 635 participants were included to analyze the RT effect against physically active control groups (Fig. 9). The SMD was -0.02 (95% CI –0.26 to 0.22) and no significant effect was found (p = 0.86). Also, the subgroup analysis did not reveal any significant difference between RT vs. Balance/Gait/Multi and RT vs. Yoga/Tai Chi (p = 0.80). The heterogeneity was moderate (I2 = 48%).

Forest plot: Resistance training vs. physically active control groups; Outcome: Motor impairment.
There were seven studies, including 567 subjects, that evaluated the effects of RT in comparison to physically active control group on mobility and balance (Fig. 10). The total MD for the TUG (measured in seconds) was 0.43 (95% CI –0.14 to 0.99) in favor of the physically active control group and the overall effect was not statistically significant (p = 0.14). Moreover, no statistically significant subgroup differences were detected between RT vs. Balance/Gait/Multi and RT vs. Yoga/Tai Chi (p = 0.98). There was no heterogeneity between the studies (I2 = 0%).

Forest plot: Resistance training vs. physically active control groups; Outcome: Mobility and balance.
Eight studies with 617 participants were included in the analysis of effects of RT against physically active control groups on QoL (Fig. 11). The total SMD was 0.23 (95% CI 0.05 to 0.40), which is a significant small effect in favor of the physically active control groups (p = 0.01). The overall effect in favor of physically active control groups was highly influenced by physically active control groups, which focused on mind-body exercises (Yoga/Tai Chi), as these studies showed an SMD of 0.44 (95% CI 0.21 to 0.67) and a statistically significant result (p = 0.0002) compared to balance, gait, and general fitness interventions (SMD 0.05, 95% CI –0.18 to 0.27; p = 0.68). These findings were also confirmed by a significant test result for subgroup differences (p = 0.02). The heterogeneity was low (I2 = 11%).

Forest plot: Resistance training vs. physically active control groups; Outcome: Quality of life (QoL).
The effect of RT compared to a physically active control group on depression was evaluated in three studies, including 237 participants (Fig. 12). An overall SMD of –0.01 (95% CI –0.64 to 0.63), which represents no effect, was measured. Therefore, the overall effect was also not significant (p = 0.98). However, a significant subgroup difference (p = 0.002) was detected between the studies, indicating Yoga/Tai Chi as a superior intervention compared to RT (p = 0.003) in contrast to Balance/Gait/Multi vs. RT (p = 0.13) for reducing depression. A considerable heterogeneity was measured (I2 = 79%).

Forest plot: Resistance training vs. physically active control groups; Outcome: Depression.
Effects of resistance training differentiated for sex
Two studies with a total of 75 participants examined the effects of RT against passive control groups on muscle strength, differentiated for female and male participants, and were included in the analysis (Fig. 13). The SMD for female participants was –0.78 (95% CI –1.73 to 0.18), indicating a moderate effect in favor of RT. The analysis reached no statistical significance (p = 0.11), but a positive trend favoring RT was detected. For the male participants, the SMD was –0.67 (95% CI –1.46 to 0.12), which is also defined as a moderate effect size. This result was not statistically significant (p = 0.09). Furthermore, no statistically significant subgroup differences between female and male PD individuals were detected (p = 0.87), indicating no sex differences on muscle strength after a RT. No heterogeneity was detected between the studies (I2 = 0%).

Forest plot: Resistance training vs. passive control groups; Outcome: Muscle strength differentiated for sex.
Data examining RT against passive control groups on FoG differentiated for sex were provided by two studies, including 68 participants (Fig. 14). The SMD for the female participants was 0.52 (95% CI –0.25 to 1.29), which is a moderate effect in favor of the passive control groups. The result was not significant (p = 0.18). For the male participants, an SMD of –0.29 (95% CI –0.93 to 0.35) was detected, representing a small effect in favor of RT. Again, these findings were not significant (p = 0.37) and no significant differences between female and male participants were measured for FoG (p = 0.11). Again, no heterogeneity was analyzed (I2 = 0%).

Forest plot: Resistance training vs. passive control groups; Outcome: Freezing of gait (FoG) differentiated for sex.
Sensitivity analysis and publication bias
There were no differences between the fixed and random effects models in regard of statistically significant results (Supplementary Table 3). However, in two outcomes (RT vs. passive control groups: maximal strength and motor impairment) the interpretation of the SMD varied, resulting in a moderate effect size when using the fixed effects model compared to a large effect size when applying a random effects model (maximal strength: –0.60 vs. –0.84; motor impairment: –0.56 vs. –0.81). Moreover, the prior presented meta-analyses were repeated without studies that were judged as high RoB (≥2 RoB domains were rated as high RoB) (Supplementary Figures 2–9). The results revealed that there were no marked differences between the initial and the replicated outcome measures, indicating a high robustness of the findings. The funnel plots (Supplementary Figures 10 and 11) had an asymmetrical appearance, which may show that publication bias was present in the respective meta-analyses.
DISCUSSION
This systematic review and meta-analysis examined the effects of RT on muscle strength, motor impairment, FoG, mobility and balance, QoL, depression, cognition, and safety aspects (AEs) in PD patients. The effects of RT were compared to either passive and placebo or physically active control groups. In total, the review included 18 RCTs and 1,134 participants. The main findings are that (i) RT improves muscle strength, motor impairment, mobility and balance, as well as QoL, in comparison to passive and placebo control groups in PD patients—confirming our hypothesis; (ii) RT shows no superior effects on muscle strength, motor impairment, mobility and balance, as well as QoL, and depression in comparison to physically active control groups in PD patients–also in line with our expectations; (iii) There are no indications of sex-specific RT effects in PD patients; (iv) Supervised RT is relatively safe, because either no or only non-serious AEs (e.g., temporary mild knee pain, muscle soreness) directly related to the intervention were reported.
Effects of resistance training in comparison to passive and placebo control groups
Participants conducting RT showed statistically significant small to large improvements of muscle strength, motor impairment, mobility and balance, as well as QoL compared to participants who were randomized to a passive and placebo control group. Only for FoG no significant results were measured. The effect of RT on depression and cognition is uncertain, because only one study each reporting these outcomes was available. Therefore, no meta-analyses were conducted. While the results of the RCTs revealed beneficial effects of RT on depression [38] and cognition [33] in people with PD, more RCTs are required to conduct meta-analyses in future studies.
The demonstrated large treatment effect on muscle strength in PD patients has been reported in previous reviews as well. Although former reviews did not differentiate their analyses between passive and placebo or physically active control groups, except for one study [2], analyzed treatment effects were found to be similar [11, 12]. Possible underlying mechanisms that have been discussed are that RT induces mechanical, metabolic, and neural stimuli which lead to fiber hypertrophy, improvements of neural control (synchronization of motor unit recruitment, stimulation frequency of motor units, reduced autogenic inhibition) and alters the genetic expression of PGC1-α and mitochondrial function in skeletal muscle [49, 50]. It should also be noted that despite the high variety among the RT programs (duration: 45 to 90 min/session; frequency: 1–3 sessions/week; intensity: strength-endurance to hypertrophy; length: 7 weeks to two years), all included protocols in this review were effective in improving muscle strength in PD patients. This is particularly important, because research has shown that PD patients have a reduced muscle strength, rate of force development, force control and an increased risk of falling [5, 51]. Therefore, these findings emphasize the importance of an implementation of RT in the treatment of PD.
Large effects were also observed for the outcome motor impairment. So far, there is only one meta-analysis, comprising three trials and 213 participants, that has examined UPDRS-M scores in their meta-analysis [12]. Despite corresponding statistically significant effects in a meta-analysis by Chung et al. (2016) (SMD = 0.48; 95% CI 0.21 to 0.75; p = 0.0006), the SMD was 0.33 smaller than the findings from this meta-analysis. However, the reason may be that Chung et al. (2016) did not divide the control groups into active and passive comparators, which would be crucial to examine the specificity of the results when contrasting two different interventions. The large effect of RT on motor impairment may be based on an altered neural control and improvements in neuroplasticity [52, 53].
With regard to the RT effect on FoG, one former review [11] found a statistically significant improvement of 1.74 (95% CI –3.18 to –0.30; p = 0.02) on the FoG-Questionnaire score. Considering the very wide 95% CI and a small sample size of 163 participants, the certainty of this result was interpreted as low. This is also in line with the findings of this analysis, because only three trials and a total of 111 subjects were included and the 95% CI has a wide range. Therefore, the small effect, which was not found to be statistically significant, should be interpreted with caution and more studies analyzing possible mechanisms and effects of RT on FoG are necessary. For future studies, the FoG-Q and the New Freezing of Gait Questionnaire should not be used as a primary outcome, because it was found to be insufficiently sensitive to detect small effect changes in clinical trials. Instead, other robust and objective FoG outcome measures should be utilized (e.g., FoG video annotations and wearables) [54].
Mobility and balance are further substantial abilities for managing ADLs that are commonly affected by PD. The association between mobility and balance and muscle strength has been demonstrated in numerous studies [55, 56], so that it can be assumed that mobility and balance benefit from a RT as well. Again, this may particularly be the case, because RT affects neural control and induces neuroplasticity, which contribute to improvements of balance and walking [57]. This suggestion was supported by this meta-analysis: The overall effect was statistically significant, and the MD demonstrated an improvement of 1.81 seconds (95% CI –3.13 to –0.49) on the TUG after completing a RT program. In a different meta-analysis [11], the overall effect was also statistically significant, and the MD was –1.17 seconds (95% CI –2.27 to –0.08) favoring RT, but again, the comparator included both active and passive control groups, limiting the comparison between the two estimates. The mean baseline score of the TUG from all included studies in this review was 12.7 seconds, which represents a typical mean score for PD patients [58]. Furthermore, evidence shows that PD patients with a TUG score higher than 11.5 seconds are at a risk for falling [59]. Though an individually varying range, instead of fixed cut-off values, to classify subjects as risk of falling seems more appropriate, an improvement of 1.81 seconds on the TUG leads to a decrease below the cut-off time for fall risk of 11.5 seconds. This significantly reduces the risk for falls of PD patients [59].
For the outcome QoL, small statistically significant effects of RT versus passive interventions for PD patients were measured. This result is in line with a recent review [11] which confirms statistically significant improvements of the QoL score after a RT for people with PD (MD = –7.22; 95% CI –12.05 to –2.39; p = 0.003). However, another review [12] found no statistically significant RT effect on QoL (SMD = 0.15; 95% CI –0.12 to 0.42; p = 0.29), but this result was based on only 3 studies with a total of 212 participants. Since there is limited data available on the effects and possible mechanisms of RT on QoL in PD patients, other patient groups were searched and significant effects of RT on QoL were found in depressed older adults [60]. Depressive symptoms were found to be common in PD patients and have considerable impacts on QoL [61]. According to the authors, the improvement of QoL is mainly due to an anti-depressive effect of RT [60]. The anti-depressive effect of RT has also been confirmed by further research [62] and could be one explanation for the beneficial effects of RT on QoL in PD as well. Another reason may be that higher muscle strength leads to better mobility and ADLs, which both are highly relevant health-related QoL factors that are also considered by the PDQ-39 [27].
Effects of resistance training in comparison to physically active control groups
No statistically significant effects of RT on muscle strength, motor impairments, mobility and balance, QoL, and depression compared to other physically active control groups were found in this review. This result is in line with a previous meta-analysis, which revealed that RT did not lead to a statistically significant higher muscle strength compared to control groups with other physically active interventions (i.e., tai chi, balance, and gait exercises) [2]. Unfortunately, no further meta-analysis has investigated the effects of RT compared to a physically active control group on any of the other outcomes so far.
One reason why RT may not have superior treatment effects than other physically active control groups could be that the conducted RT interventions included not only pure RT, but in some cases additional exercises (e.g., balance, agility activities). This may have led to similar training contents between the experimental and physically active control groups. Another reason could be that other physically active interventions are associated with comparable neurobiological mechanisms as RT (inducing metabolic, mechanical, and neural stimuli) and have the potential to initiate similar intracellular signaling pathways and adaptations [63, 64]. Furthermore, the specific RT program (strength-endurance, hypertrophy, maximal strength, or explosive strength) determines whether a training stimulus is rather metabolic or mechanical/neural. For example, a balance training, which is generally characterized by high frequency and low intensity exercises, may induce similar stimuli as RT. This is particularly the case, when RT is of low intensity and high frequency as well, because primarily metabolic stimuli are induced (whereas maximal strength and explosive strength primarily induce mechanical/neuronal stimuli). However, the transitions are continuous and not clearly distinct [49, 64]. This results in numerous highly individual adaptations within the body [64], which cannot be as clearly distinguished as illustrated in theoretical models. Due to these reasons, RT effects cannot always be clearly separated from other training interventions. For future studies, only pure RT, which is particularly defined as strength-endurance, hypertrophy, maximal strength, or explosive strength should be eligible for the experimental intervention to analyze the underlying mechanisms and specific training effects.
An important aspect of reaching conclusions from meta-analyses is to differentiate between ‘no evidence of an effect’ and ‘evidence of no effect’ by interpreting the CIs and the sample sizes for each outcome [65]. Due to narrow 95% CIs and adequate numbers of trials and participants for all analyzed outcomes, except FoG and depression, the evidence of no effect of RT compared to other physically active control groups seems appropriate. For FoG and depression, the 95% CI were wide and only few studies were included in the analysis, so that the conclusion of ‘no evidence of effect’ seems the most adequate. More RCTs investigating the effects of RT compared to physically active control groups on FoG, depression and cognition are required to increase the certainty of evidence.
In addition to the findings presented above, subgroup differences between a balance, gait and general fitness program (Balance/Gait/Multi) and a body workout with a focus on mental practices (Yoga/Tai Chi) were analyzed for QoL. A small statistically significant effect favoring physically active control groups was found. This is primarily based on the Yoga/Tai Chi subgroup, which has a SMD of 0.44 (95% CI 0.21 to 0.67; p = 0.0002) and a weight of 48%. The role of Yoga and Tai Chi on QoL was also confirmed by the χ2 test for subgroup differences (p = 0.02). The beneficial effects of yoga on QoL in various patient groups have been confirmed in a review [66], which additionally underlined the holistic approach of yoga to health. In particular, yoga has a high focus on the mind-body connection and provides not only physical, but also psychological stimuli (e.g., awareness of the self and the breath, stress management techniques). This may be a reason why yoga might have superior effects on QoL compared to a balance, gait and general fitness intervention, which primarily aims at improving physical functions.
Sex differences in resistance training
There is evidence that healthy older females and males (>50 years) adapt differently to the same RT program [67]. This approach corresponds to the concept of “precision medicine”, which is the incorporation of individual data, such as clinical, lifestyle, genetic and biomarker information, and aims to maximize the quality of health care [68, 69]. If there is scientific evidence for different responses to RT between females and males, this may lead to future sex-specific exercise prescription and higher training adaptations for PD patients. Therefore, this review additionally conducted a subgroup analysis to examine sex differences concerning muscle strength and FoG after completing a RT in patients with PD. According to the meta-analyses, no statistically significant differences between female and male participants were detected for any outcomes. However, due to the limited studies included in the analyses, no certain conclusions can be drawn and more RCTs differentiating their analyses for female and male patients are needed.
Safety and adverse events in resistance training
Addressing AEs in studies examining healthcare interventions is an important aspect to identify threats and to monitor the safety of the specific intervention. According to the findings of this review, it can be summarized that RT can be considered as relatively safe for people with PD. However, it should be noted that only supervised RT interventions were included in the analysis and the results may differ from non-supervised interventions. This conclusion is in line with another recent review, which analyzed AEs of general exercise therapy across different patient groups and supports the general assumption that exercise therapy is a relatively safe intervention [70]. Nevertheless, more studies should consider AEs to achieve a balanced perspective of adverse and beneficial effects of the intervention, also in consideration of the specific patient population. It can also be concluded that the methodological reporting of AEs is inadequate. Because of a high diversity in the type and amount of documenting AEs, as well as variable definitions of AEs, methods of assessment and analyzed time-courses, a systematic and standardized collection of data on AEs was not possible in this review. Therefore, future studies should implement methods to systemically assess and monitor AEs in all trial arms, utilize validated instruments and use consistent definitions of AEs.
Heterogeneity and reporting quality
This study revealed that several meta-analyses demonstrated substantial heterogeneity. This may be explained by different characteristics of the participants and interventions (clinical heterogeneity). Furthermore, variability in the eligibility criteria, outcome measurement tools and risk of bias are possible reasons for clinical and methodological heterogeneity [71]. The different eligibility criteria of the included studies may have influenced the magnitude of the treatment effect. In addition, the RT programs delivered in the included studies varied considerably with respect to the setting, trained muscle groups, training materials, intensity, frequency, duration, and length of the intervention. Another reason for the high heterogeneity, especially in the analysis of depression, could be the variable use of test instruments. The substantial heterogeneity in the analysis of muscle strength is due to one study only [47], which has a SMD of –3.74 while the other eight studies are consistently lower, ranging from –0.27 to –1.19. By conducting subgroup analyses or meta-regression, the specific causes of heterogeneity may be determined. However, reliable conclusions can only be drawn from analyses which were pre-specified [36]. Future studies should take this into account and pre-specify possible varying characteristics between studies to identify reasons for heterogeneity.
Limitations
With regard to the reporting quality of the included studies, it should be noted that certain study information was insufficient in several trials, which led to limited interpretation of the results in some cases. In particular, missing data concerning the specific H&Y stage, medication and a standardized value of the physical capability of the study sample at baseline restricted the classification of the study sample. Furthermore, inadequate information on the attendance of the intervention groups limits the interpretation of treatment effects. For future studies it is therefore recommended to apply guidelines such as the CONSORT statement for reporting RCTs of nonpharmacological treatments [72] to enhance reporting quality. In addition, more journals should require reporting according to CONSORT as a premise for publication. In terms of RoB, blinding was the main limitation of the included studies. While blinding of the outcome assessors is possible but has not been applied in all included studies, blinding of participants and personnel is not possible in RT interventions. Hence, a high focus on blinding of the outcome assessors and all other RoB domains is important when conducting RCTs to minimize the overall bias.
One further limitation of the review was the marked differences in the RT programs, including materials, exercises, muscle groups, intensity, frequency, duration, and length. But not only the RT programs differed widely, but also the utilized instruments for outcome measurements.
Furthermore, the results of this review only provide information about short-term effects. Future research should also investigate long-term effects of RT in PD in order to make more accurate predictions about lasting benefits.
Another aspect that was neglected was training adherence. Subsequently, no conclusion can be drawn regarding adherence rates and dose-response relationships. The investigation of training adherence is recommended for future systematic reviews to draw conclusions about dose-response relationships. However, the analysis is highly dependent on adequate and complete reporting of adherence rates within the included studies.
Although AEs were addressed in this review, no specific fall-related measures were included. This was because only two studies evaluated falls using a valid assessment tool, such as the FES-I (Supplementary Table 2).
Since the RT interventions of this study included not only pure RT, but in some cases also additional training contents (i.e., balance exercises, agility activities), similar training contents between the experimental and physically active control groups may have resulted. Therefore, this methodological limitation could have contributed to no statistically significant effects of RT compared to other physically active control groups. In addition, non-physically active control groups (i.e., cognitive training) were not available for this study. Hence, no conclusion can be drawn about possible superior effects of RT compared to non-physically active control groups. It would also have been preferable to distinguish between strictly passive (no intervention and usual care) and placebo (information sessions and stretching) control groups, to reach more accurate results on the effects. However, this would have resulted in a loss of power of the meta-analyses.
Another limitation that is caused by a low number of studies is that the test power of the funnel plots was too low to distinguish chance from real asymmetry [73]. Accordingly, a limited judgement can be made about publication bias.
Conclusion
Our analyses revealed that RT improves muscle strength, motor impairment, mobility and balance, and QoL, in PD patients. However, RT led to no superior effects on muscle strength, motor impairment, mobility and balance, QoL, and depression compared to other physically active interventions in PD subjects. Therefore, RT may be neither superior nor inferior to other sports considered in this review. PD patients are recommended to exercise, but the type of sport may play a minor role. Therefore, factors such as individual preferences of the PD patients should be considered in order to maintain fun, motivation and, thus, therapy adherence in the long term. Moreover, supervised RT is relatively safe for PD patients. However, these findings were based on RCTs with limited methodological quality and should be considered with caution.
Overall, RT and other physically active interventions can be recommended for PD patients to improve motor and non-motor symptoms. Training parameters should precisely adhere to the specific recommendations of strength-endurance, hypertrophy, maximal strength, or explosive strength in future studies to ensure greater comparability. But more studies are necessary to furtherly understand the beneficial effects and to develop best-practice RT programs for people with PD, while investigating possible sex differences.
Footnotes
ACKNOWLEDGMENTS
This project is funded by the German Ministry of Education and Research (grant no 01KG1902).
CONFLICT OF INTEREST
M.E. has received a grant from the German Ministry of Education and Research (BMBF) which does not lead to a conflict of interest.
M.R. has received a grant from the German Ministry of Education and Research (BMBF), and the Brandau-Laibach Foundation.
N.S. has received a grant from the German Ministry of Education and Research (BMBF) which does not lead to a conflict of interest.
E.K. has received grants from the German Ministry of Education and Research, ParkinsonFonds Deutschland gGmbH, the German Parkinson Society, and the German Alzheimer’s Society, and honoraria from Oticon GmbH, Hamburg, Germany; Lilly Pharma GmbH, Bad Homburg, Germany; Bernafon AG, Bern, Switzerland; and Desitin GmbH, Hamburg, Germany. EK is author of the cognitive intervention programs “NEUROvitalis” but receives no corresponding honoraria.
A.K.F. has received grants from the German Parkinson Society and the German Alzheimer’s Society, as well as honoraria from Springer Medizin Verlag GmbH, Heidelberg, Germany; Springer-Verlag GmbH, Berlin; ProLog Wissen GmbH, Cologne, Germany; pro audito Switzerland, Zürich, Switzerland; Seminar- und Fortbildungszentrum Rheine, Germany; and LOGOMANIA, Fendt & Sax GbR, Munich, Germany. AFK is author of the cognitive intervention programs “NEUROvitalis” but receives no corresponding honoraria.
The remaining authors have no conflict of interest to report.
