Abstract
Background. Robot technology for poststroke rehabilitation is developing rapidly. A number of new randomized controlled trials (RCTs) have investigated the effects of robot-assisted therapy for the paretic upper limb (RT-UL). Objective. To systematically review the effects of poststroke RT-UL on measures of motor control of the paretic arm, muscle strength and tone, upper limb capacity, and basic activities of daily living (ADL) in comparison with nonrobotic treatment. Methods. Relevant RCTs were identified in electronic searches. Meta-analyses were performed for measures of motor control (eg, Fugl-Meyer Assessment of the arm; FMA arm), muscle strength and tone, upper limb capacity, and basic ADL. Subgroup analyses were applied for the number of joints involved, robot type, timing poststroke, and treatment contrast. Results. Forty-four RCTs (N = 1362) were included. No serious adverse events were reported. Meta-analyses of 38 trials (N = 1206) showed significant but small improvements in motor control (~2 points FMA arm) and muscle strength of the paretic arm and a negative effect on muscle tone. No effects were found for upper limb capacity and basic ADL. Shoulder/elbow robotics showed small but significant effects on motor control and muscle strength, while elbow/wrist robotics had small but significant effects on motor control. Conclusions. RT-UL allows patients to increase the number of repetitions and hence intensity of practice poststroke, and appears to be a safe therapy. Effects on motor control are small and specific to the joints targeted by RT-UL, whereas no generalization is found to improvements in upper limb capacity. The impact of RT-UL started in the first weeks poststroke remains unclear. These limited findings could mainly be related to poor understanding of robot-induced motor learning as well as inadequate designing of RT-UL trials, by not applying an appropriate selection of stroke patients with a potential to recovery at baseline as well as the lack of fixed timing of baseline assessments and using an insufficient treatment contrast early poststroke.
Introduction
Stroke is the second cause of mortality and the third cause of long-term disability worldwide with 33 million stroke survivors.1-3 A majority of patients with hemispheric stroke has limited use of the affected upper limb. In the first days after stroke onset, this concerns about 80% of the patients, 4 while deficits in upper limb capacity persist at 6 months poststroke in 30% 5 to 66% 6 of the hemiplegic stroke patients. One year after stroke, upper limb deficits are accompanied by higher levels of anxiety, 7 lower perceived health-related quality of life, 8 and reduced self-reported well-being. 9 Hence, improving upper limb capacity is a major therapeutic target in stroke rehabilitation.10,11
Several systematic reviews suggest that intensity of practice, typically expressed by number of repetitions, and task-specific training are the main drivers of effective motor rehabilitation interventions after stroke.12,13 The devices used in robot-assisted therapy for the upper paretic limb (RT-UL) target either the shoulder/elbow, elbow, elbow/wrist, wrist/hand, or the upper limb as a whole. The devices are designed based on different principles by using (a) exoskeletons with torque actuators controlling one or more joints of the paretic upper limb14,15 or (b) end-effectors in which only the most distal part of the paretic limb is guided. RT-UL can offer patients feedback about position and force and is typically supported by games that facilitate functional use of the upper paretic limb.14,15 An alternative classification of upper limb robotics can be based on the number of joints (ie, arm segments), and hence the degrees of freedom that they control in upper limb performance out of basic motor synergies (ie, synergy-independent movements). 16
A number of systematic reviews about evidence for RT-UL after stroke have been published in the last decade.11,17-19 Our previous meta-analysis, 17 involving 10 eligible trials (N = 218), found a small, but significant effect on motor recovery (eg, Fugl-Meyer Assessment [FMA] scores) of the paretic arm (~2 points) without significant effects on upper limb capacity or activities of daily living (ADL). However, the meta-analysis showed significant heterogeneity between trials when comparing distal and proximal arm robotics. A few years later, Norouzi-Gheidari et al 18 compared RT-UL with only dose-matched conventional therapy and reported no significant differences in terms of motor recovery after pooling 10 trials in a meta-analysis. In a similar vein, a Cochrane review by Mehrholz et al, 19 involving 34 trials (N = 1160), showed that patients who receive robot-assisted arm training after stroke are more likely to improve their generic ADL. However, they also stated that variation between trials regarding duration and amount of training, type of treatment, and differences in patient characteristics, hampered comparison, and proper interpretation of reported effects. 19 Unfortunately, no sensitivity analysis was performed with respect to dose-matched and non–dose-matched trials, making the added value of RT-UL therapy unclear.
The field of upper limb robotics in stroke rehabilitation is rapidly developing as new commercial devices become available. This is also shown by the number of randomized controlled trials (RCTs) investigating the added value of different types of RT-UL poststroke. 13 The first aim of the present systematic review was to determine the effects of RT-UL in patients after stroke on outcomes of motor control of the paretic upper limb, upper limb capacity, and basic ADL, in comparison with nonrobotic treatment. Secondary outcomes were muscle strength and muscle tone. The second aim was to analyze subgroups regarding (a) robot type, that is, exoskeleton or end-effector robotic devices; (b) joints involved (eg, shoulder/elbow RT-UL, whole arm RT-UL); (c) timing of RT-UL delivery poststroke; and (d) treatment contrast in terms of time spent in exercise therapy. Finally, safety and acceptance of RT-UL were investigated by assessing the incidence of adverse events, as were the number of dropouts from the studies during the study period and the impact of small study effects, acknowledging that small numbered trials with neutral or negative results are less likely to be published in the literature. 20
Methods
Definitions
The World Health Organization (WHO) defines stroke as “rapidly developing clinical symptoms and/or signs of focal, and at times global, disturbance of cerebral function, lasting more than 24 hours or leading to death, with no apparent cause other than that of vascular origin.” 21 Rehabilitation robotics was defined following the Medical Subject Heading (MeSH) definition as “the application of electronic, computerized control systems to mechanical devices designed to assist human functions in rehabilitation, formerly restricted to industry, but nowadays applied to artificial organs controlled by bionic (bioelectronic) devices, like automated insulin pumps and other prostheses.” 22
Based on the nature of control exerted on the upper limb, we classified rehabilitation robotics into exoskeleton devices and end-effector devices. Exoskeleton devices were defined as “external structural devices with axes aligned with anatomical axes of the human body, providing direct control of individual joints.” 15 End-effector devices were defined as “systems with a single distal attachment point to apply mechanical forces to the distal segment of a limb.”14,15 The present systematic review was restricted to RCTs, defined as “a study in which the subjects followed in the trial were definitely or possibly assigned prospectively to one of two (or more) alternative forms of health care using random allocation.” 23
Dose-matched trials referred to trials in which the experimental and control groups spent an equal amount of time on exercise therapy. Trials in which the groups did not spend the same amount of time on exercise therapy were considered non–dose-matched. 24
Literature Search
Analogous to our 2008 review, 17 we systematically searched Medline, CINAHL, EMBASE, and PEDro. Online Supplemental Table 1 shows the applied search strategy for identifying relevant literature. Studies were collected from inception up to 20 August 2015. RCTs were included when (a) patients had been diagnosed with stroke, (b) effects of RT-UL were investigated, (c) outcomes were assessed in terms of motor recovery and/or upper limb capacity and/or basic ADL and/or muscle strength and tone postintervention, (d) the study used an RCT design, and (e) the article was written in English, German, or Dutch. RCTs that compared the effects of 2 different types of RT-UL using no comparison with a control group without RT-UL were excluded. Relevant publications were selected by one reviewer (JMV or ACL). Reference lists of included RCTs, and relevant systematic and narrative reviews, were screened for relevant publications. Authors of relevant publications, including conference abstracts, were contacted for data when postintervention means and/or SDs were not reported.
Methodological Quality
We rated the methodological quality of included trials using the PEDro scale. 25 One reviewer (JMV or ACL) scored all RCTs and crosschecked the scores with the PEDro database (http://www.pedro.org.au). In case of disagreement, the other reviewer (JMV or ACL) made the final decision. For RCTs not listed in the PEDro database, 2 reviewers (JMV and ACL) independently assessed the methodological quality, solving any disagreements in a consensus meeting. Reviewers were not blinded to author(s), institution(s), or journal. PEDro scores of 4 points or more were classified as “sufficient quality,” whereas studies with 3 points or less were classified as “insufficient quality” and were subsequently excluded from meta-analysis. 26
Outcomes
We classified outcomes according to the domains of the International Classification of Functioning Disability and Health model (ICF), 27 investigating the following ICF constructs: (a) motor control, (b) muscle strength, (c) muscle tone, (d) upper limb capacity, and (e) basic ADL.
Body Function Level
The motor part of the FMA for the arm (FMA arm) or subscales of FMA arm (eg, FMA shoulder/elbow/coordination [FMA-SEC] and FMA wrist/hand [FMA-WH]), as well as the impairment inventory section of the Chedoke-McMaster Stroke Assessment, were both classified under the ICF domain “control of voluntary movement functions” [b760] (ie, motor control). Both capacity-based indexes reflect patient’s ability to move their arms with the paretic arm “out-of-synergy” and to “dissociate” from their basic limb synergies poststroke (ie, synergy independent). 28
The Medical Research Council, Motricity Index arm subscale (MI-arm), and Motor Power Scale (MPS) represent strength and were classified under the ICF domain “muscle power functions” [b730].
Measures such as the (Modified) Ashworth Scale (AS or MAS) were classified under the ICF domain “muscle tone functions” [b735].
Activities Level
We defined upper limb capacity according to the ICF domains “fine hand use” [d440] and “hand and arm use” [d445]. This domain includes not only laboratory measures such as the Action Research Arm Test (ARAT), Wolf Motor Function Test (WMFT), Arm Motor Ability Test (AMAT), and Box and Blocks Test (BBT) but also measurements of real-world use of the paretic arm in the patient’s daily life, such as activity monitoring.
Basic ADL was defined as a combination of the following domains: “washing oneself” [d510], “toileting” [d520], “dressing” [d540], “eating” [d550], and “urination functions” [d620]), and involves measurements such as the Functional Independence Measure (FIM), modified Rankin Scale (mRS), and Barthel Index (BI).
Quantitative Analysis
The following extracted data were entered in Microsoft Word files by JMV or ACL: first author, year of publication, mean age of the patients in each group, poststroke phase, intervention characteristics (type of robot, ie, exoskeleton or end-effector; the number of joints involved; the number of degrees of freedom that are controlled by the applied robot; the duration of RT-UL; used comparator in the trial). The number of patients in each group, mean scores on the scales for motor control, muscle strength and tone, upper limb capacity, and basic ADL after intervention, and their standard deviations (SDs) in experimental and control groups were entered in RevMan 5.3 by one reviewer (JMV or ACL) and cross-checked by another reviewer (JMV or ACL); also classification of joints targeted by RT-UL was cross-checked. For all outcomes, except MAS, WMFT, and AMAT, a higher score was regarded as positive. The WMFT and AMAT scores were multipied by −1 to allow pooling with the other upper limb capacity outcomes.
For the FMA arm, FMA-SEC, and FMA-WH, we established the mean difference (MD) in individual studies by calculating the difference between the means of the experimental and control groups. For muscle strength, muscle tone, upper limb capacity, and basic ADL, we established the standardized mean difference (SMD) based on Hedges’ g for individual studies by calculating the difference between the means of the experimental and control groups, divided by the average population standard deviation (SD i ). If necessary, means and SD i were requested from the respective authors. The MD or SMD values of individual studies were averaged, resulting in a summary effect size (SES) with corresponding 95% confidence interval (CI). Following Cohen, 29 the effect sizes were classified into small (<0.2), medium (0.2-0.8), and large (>0.8).
We calculated the I2 statistic to determine between-study variation. 23 In case of statistical heterogeneity, defined as an I2 statistic of ≥50%, a random-effects model was applied. A fixed-effect model was applied for I2 statistics <50%.
A first subgroup analysis was based on the joints involved in the treatment. The RT-UL groups were divided into 5 subgroups: shoulder/elbow, whole arm (shoulder/elbow/wrist and/or hand), elbow, elbow/wrist/hand, wrist/hand, and hand. Three other subgroup analyses were performed for exoskeleton versus end-effector robotics, time of RT-UL delivery (early-started [<3 months] versus late-started [≥3 months]), and therapy dose (dose-matched versus non–dose-matched trials).
All analyses were performed using RevMan 5.1 and a P value <.05 was considered statistically significant.
Results
Literature Search
Figure 1 shows the flowchart of the search. A total of 44 RCTs were included (Figure 1).30-75 Forty-four RCTs were included out of a total of 341 identified unique hits in the electronic search, and 1 RCT was identified by reference checking. 53 Online Supplemental Table 2 shows the main characteristics of the included RCTs. Duration of the intervention ranged from 2 weeks 53 to 12 weeks, 36 with a total therapy time of 0.5 hours 41 to 90 hours. 36 Number of repetitions per session ranged from 50 38 to 2700 to 3600. 54 The 5 most common applied robotic devices were the MIME robot,31,34,39,51 the BiManuTrack,37,54,57,59-61,67 the NeReBo,40,42,55,69 the MIT MANUS,30,32,35,43,48,71 and the InMotion Shoulder Elbow Robot.36,44,52,74

Literature search flowchart.
Methodological Quality
The median PEDro score was 6 (interquartile range, 5-6.25) (see Online Supplemental Table 3). Twenty percent of the RCTs described that the allocation was concealed, while 34% of the trials performed an intention-to-treat analysis.
Quantitative Analysis
Thirty-eight (N = 1206) of the 44 identified RCTs were found suitable for quantitative analysis. Six trials were excluded from pooling; 1 trial due to insufficient methodological quality 41 and 5 trials due to lack of point measures and/or measures of variability.31,33,38,47,49 Pooling of data was not possible for the RT-UL elbow, 63 elbow/wrist/hand, 50 and wrist 46 subgroups, because 1 RCT was available. The forest plots of all performed meta-analyses can be found online (Online Supplemental Figures at http://nnr.sagepub.com/supplemental).
Motor Control
The FMA arm score was assessed in 28 RCTs (N = 884), with a total of 34 comparisons. Overall, a significant homogeneous SES (MD 2.23, 95% CI 0.87 to 3.59; Z = 3.21, P = .001, I2 = 30%) was found (Figure 2). The FMA-SEC score (maximum 42 points) was reported in 14 RCTs, representing 17 comparisons (N = 369). An overall significant homogeneous SES (MD 2.62, 95% CI 1.48 to 3.76; Z = 4.50, P < .00001, I2 = 34%) was found in favor of RT-UL. The FMA-WH score was determined in 17 RCTs (N = 443) representing 21 comparisons. A nonsignificant heterogeneous SES (MD 1.22, 95% CI −0.61 to 3.05; Z = 1.30, P = .19, I2 = 75%) was found.

Forest plot of all robot therapy for the upper limb (RT-UL) versus any type of control for the outcome of motor control of the paretic arm postintervention.
Shoulder/elbow robotics
Fourteen RCTs (N = 528) with a total of 17 shoulder/elbow comparisons assessed the FMA arm score, yielding an overall significant homogeneous SES (MD 2.45, 95% CI 0.63 to 4.27; Z = 2.64, P = .008, I2 = 14%). Eight RCTs (N = 228), with a total of 10 comparisons, assessed the FMA-SEC score. A significant homogeneous SES (MD 2.15, 95% CI 0.73 to 3.57; Z = 2.96, P = .003, I2 = 31%) was found in favor of RT-UL (Figure 3). Ten RCTs (N = 290), representing 13 comparisons, assessed the FMA-WH score, resulting in a nonsignificant homogeneous SES (MD 0.66, 95% CI −0.39 to 1.72; Z = 1.24, P = .22, I2 = 33%).

Forest plot of shoulder/elbow robot therapy for the upper limb (RT-UL) versus any type of control for the outcome of motor control of the proximal part of the paretic arm postintervention.
Whole arm robotics
Two RCTs (N = 62) with a total of 2 comparisons assessed the effects of whole-arm robotics on the FMA arm score. Pooling resulted in a nonsignificant heterogeneous SES (MD 2.17, 95% CI −11.90 to 16.23; Z = 0.30, P = .76, I2 = 81).
Shoulder/elbow/wrist robotics
In the shoulder/elbow/wrist/hand group, 3 RCTs (N = 102) with a total of 4 comparisons assessed the FMA arm score, yielding a nonsignificant homogeneous SES (MD −0.36, 95% CI −3.48 to 2.76; Z = 0.23, P = .82, I2 = 0%).
Elbow/wrist robotics
The FMA arm score was assessed in 5 RTCs (N = 131) with a total of 7 comparisons. Pooling resulted in a significant homogeneous SES (MD 6.55, 95% CI 2.60 to 10.51; Z = 3.25, P = .001, I2 = 46%). The FMA-SEC score was assessed in 2 RCTs (N = 65) with a total of 3 comparisons, resulting in a significant homogeneous SES (MD 3.51, 95% CI 0.96 to 6.07; Z = 2.70, P = .007, I2 = 44%). Pooling FMA-WH data of 2 RCTs (N = 65) with a total of 3 comparisons resulted in a nonsignificant heterogeneous SES (MD 4.88, 95% CI −1.22 to 10.99; Z = 1.57, P = .12, I2 = 53%).
Wrist/hand robotics
Two RCTs (N = 30) assessed the FMA-SEC. Pooling yielded a nonsignificant homogeneous SES (MD 4.28, 95% CI −1.06 to 9.63; Z = 1.57, P = .12, I2 = 0%). Two RCTs (N = 30) deter-mined the FMA-WH score, yielding a nonsignificant homogeneous SES (MD 1.60, 95% CI −2.30 to 5.50; Z = 0.80, P = .42, I2 = 0%).
Upper Limb Capacity
The functional outcome in terms of upper limb capacity was assessed with laboratory measures in 20 RCTs (N = 682) with a total of 24 RT-UL groups, yielding a nonsignificant homogeneous SES (SMD 0.04, 95% CI −0.12 to 0.19; Z = 0.47, P = .64, I2 = 2%) (Figure 4). Pooling was not possible for real-world arm use, because only 1 trial reported this outcome. 72

Forest plot of all robot therapy for the upper limb (RT-UL) versus any type of control for the outcome upper limb capacity as measured in the laboratory.
Whole arm robotics
Upper limb capacity assessed with laboratory measures was reported in 2 RCTs (N = 62) and pooling resulted in a nonsignificant homogeneous SES (SMD 0.07, 95% CI −0.13 to 0.27; Z = 0.69, P = .49, I2 = 17%).
Shoulder/elbow robotics
Upper limb capacity was assessed in 12 RCTs (N = 413) with a total of 15 comparisons, yielding a nonsignificant homogeneous SES (SMD 0.07, 95% CI −0.13 to 0.27; Z = 0.69, P = .49, I2 = 17%).
Hand robotics
Two RCTs (N = 39) with a total of 2 comparisons assessed upper limb capacity in the laboratory, yielding a nonsignificant homogeneous SES (SMD 0.19, 95% CI −0.45 to 0.82; Z = 0.58, P = .56, I2 = 0%).
Basic Activities of Daily Living
Basic ADL was assessed in 14 RCTs (N = 427) with a total of 17 comparisons, yielding a nonsignificant heterogeneous SES (SMD 0.27, 95% CI −0.05 to 0.59; Z = 1.67, P = .09, I2 = 56%).
Shoulder/elbow robotics
Pooling was possible for 11 RCTs (N = 330) with a total of 14 comparisons, yielding a nonsignificant heterogeneous SES (SMD 0.24, 95% CI −0.17 to 0.65; Z = 1.15, P = .25, I2 = 64%) for basic ADL.
Muscle Strength
Muscle strength of the paretic arm was assessed in 15 RCTs (N = 494) with a total of 21 comparisons, yielding a nonsignificant heterogeneous SES (SMD 0.19, 95% CI −0.12 to 0.50; Z = 1.23, P = .22, I2 = 56%).
Shoulder/elbow robotics
In the shoulder/elbow group, 7 RCTs (N = 254) with a total of 10 comparisons assessed strength of the paretic arm, yielding a significant homogeneous SES (SMD 0.36, 95% CI 0.10 to 0.63; Z = 2.73, P = .006, I2 = 44%) in favor of RT-UL.
Medical Research Council scores for individual muscle groups were reported in 3 RCTs (N = 71), yielding a nonsignificant homogeneous SES (SMD 0.01, 95% CI −0.47 to 0.48; Z = 0.03, P = .98, I2 = 38%) for shoulder abductor strength, a nonsignificant homogeneous SES (SMD 0.08, 95% CI −0.39 to 0.55; Z = 0.33, P = .74, I2 = 0%) for elbow flexor strength, and a nonsignificant homogeneous SES (SMD 0.11, 95% CI −0.36 to 0.58; Z = 0.46, P = .65, I2 = 35%) for wrist flexor strength.
Shoulder/elbow/wrist robotics
Strength of the paretic arm was reported in 2 RCTs (N = 88) with a total of 3 comparisons. Pooling these data resulted in a nonsignificant homogeneous SES (SMD −0.14, 95% CI −0.57 to 0.29; Z = 0.63, P = .53, I2 = 22%).
Elbow/wrist robotics
In the elbow/wrist group, 3 RCTs (N = 82) with a total of 5 comparisons assessed overall muscle strength of the paretic arm, yielding a nonsignificant heterogeneous SES (SMD 0.51, 95% CI −0.31 to 1.34; Z = 1.22, P = .22, I2 = 59%).
Muscle Tone
Muscle tone of the paretic arm was assessed with the MAS in 13 RCTs (N = 429) with a total of 18 comparisons, yielding a significant homogeneous SES (SMD 0.24, 95% CI 0.04 to 0.44; Z = 2.36, P = .02, I2 = 25%), in favor of the control group.
Pooling muscle tone scores of individual muscle groups resulted in a nonsignificant homogeneous SES (SMD −0.16, 95% CI −0.55 to 0.23; Z = 0.82, P = .41, I2 = 46%; 4 RCTs, N = 107) for the elbow flexors and a nonsignificant heterogeneous SES (SMD 0.28, 95% CI −0.91 to 1.46; Z = 0.46, P = .65, I2 = 75%; 3 RCTs, N = 54) for the wrist flexors.
Shoulder/elbow robotics
Seven RCTs (N = 206) with a total of 10 comparisons assessed the MAS score of the paretic arm. A non significant heterogeneous SES (SMD 0.43, 95% CI −0.02 to 0.87; Z = 1.89, P = .06, I2 = 50%) was found.
Shoulder/elbow/wrist robotics
Two RCTs (N = 88) measured muscle tone of the paretic arm. Pooling resulted in a nonsignificant homogeneous SES (SMD 0.42, 95% CI −0.01 to 0.84; Z = 1.90, P = .06, I2 = 0%).
Figure 5 shows a summary of found evidence of RT-UL in terms of synergy-independent motor control, muscle strength and tone, upper limb capacity, and basic ADL according to ICF classification. In addition, overall evidence is presented for different subgroups with respect to (a) the joints involved, (b) type of robot (exoskeleton or end-effector device), (c) timing of the trial (ie, started within or beyond 3 months poststroke), and (d) dosing of therapy (ie, dose-matched or non–dose-matched trials). The evidence for these subgroup analyses is presented below.

Summary of found evidence for the joints targeted by robot therapy for the upper limb (RT-UL), and for type (ie, exoskeleton or end-effector), timing poststroke (ie, <3 or ≥3 months), and treatment contrast in trials of time spent in RT-UL (ie, dose-matched or non–dose-matched trials). (✓) Beneficial or likely to be beneficial; (×) uncertain benefit; (?) unknown effect; (Ɵ), negative effect.
Exoskeleton Versus End-Effector Robotics
For exoskeleton robotics, 9 RCTs (N = 214) with 10 comparisons assessed the FMA arm score, yielding a homogeneous nonsignificant SES (MD 1.76, 95% CI −0.29 to 3.82; Z = 1.68, P = .09, I2 = 22%). Nonsignificant SESs were also found for the outcomes FMA-WH (MD −3.54, 95% CI −9.78 to 2.70; Z = 1.11, P = .27, I2 = 73%; 2 RCTs, 2 comparisons, N = 31), muscle strength (SMD −1.14, 95% CI −0.57 to 0.29; Z = 0.63, P = .53, I2 = 22%; 2 RCTs, 3 comparisons, N = 88), muscle tone (SMD 0.42, 95% CI −0.01 to 0.84; Z = 1.90, P = .06, I2 = 0%; 2 RCTs, 3 comparisons, N = 88), and upper limb capacity (SMD 0.18, 95% CI −0.16 to 0.53; Z = 1.03, P = .30, I2 = 0%; 4 RCTs, 4 comparisons, N = 130).
For end-effector robotics, significant effects were found for FMA arm scores (MD 2.59, 95% CI 0.77 to 4.41; Z = 2.79, P = .005, I2 = 35%; 19 RCTs, 24 comparisons, N = 670), FMA-SEC (MD 2.62, 95% CI 1.48 to 3.76; Z = 4.50, P < .00001, I2 = 34%; 14 RCTs, 17 comparisons, N = 369), and FMA-WH (MD 1.77, 95% CI 0.31 to 3.23; Z = 2.38, P = .02, I2 = 74%; 15 RCTs, 19 comparisons, N = 412) in favor of RT-UL. Nonsignificant SESs were found for muscle strength (SMD 0.20, 95% CI −1.14 to 0.54; Z = 1.16, P = .24, I2 = 57%; 12 RCTs, 17 comparisons, N = 406), muscle tone (SMD 0.19, 95% CI −0.03 to 0.41; Z = 1.67, P =.09, I2 = 33%; 11 RCTs, 15 comparisons, N = 341), upper limb capacity (SMD 0.00, 95% CI −0.17 to 0.17; Z = 0.01, P = .99, I2 = 6%; 17 RCTs, 21 comparisons, N = 552), and basic ADL (SMD 0.27, 95% CI −0.05 to 0.59; Z = 1.67, P = .09, I2 = 56%; 14 RCTs, 17 comparisons, N = 427).
Timing Poststroke
Ten early-start RT-UL trials involving 11 comparisons (N = 360) reported FMA arm scores, yielding a nonsignificant heterogeneous SES (MD 3.13, 95% CI −2.39 to 8.65; Z = 1.11, P = .27, I2 = 62%). Also in the late-start RT-UL trials, a nonsignificant homogeneous SES (MD 1.52, 95% CI −0.04 to 3.09; Z = 1.90, P = .06, I2 = 0%; 18 RCTs, 23 comparisons, N = 506) was found. Both in the early-start and late-start trials, a significant homogeneous SES was found for FMA-SEC (MD 2.81, 95% CI 1.44 to 4.17; Z = 4.03, P < .0001, I2 = 44%; 8 RCTs, 10 comparisons, N = 251; and MD 2.17, 95% CI 0.09 to 4.25; Z = 2.05, P = .04, I2 = 23%; 6 RCTs, 7 comparisons, N = 118, respectively). Pooling data of FMA-WH resulted in a significant heterogeneous SES in the early-start trials (MD 2.53, 95% CI 0.46 to 4.60; Z = 2.40, P = .02, I2 = 85%; 8 RCTs, 10 comparisons, N = 251) in favor of RT-UL, but in a nonsignificant SES in the late-start trials (MD −0.19, 95% CI −1.65 to 1.27; Z = 0.25, P = .80, I2 = 27%; 9 RCTs, 11 comparisons, N = 192). The meta-analyses in the early and late-start trials were nonsignificant for muscle strength, muscle tone, upper limb capacity, and basic ADL.
Dose of Robot Therapy for the Upper Limb
Pooling FMA arm scores resulted in a significant SES for the dose-matched trials (MD 2.28, 95% CI 0.89 to 3.68; Z = 3.21, P = .001, I2 = 33%; 26 RCTs, 32 comparisons, N = 808) and a nonsignificant SES in non–dose-matched trials (MD 1.07, 95% CI −5.35 to 7.49; Z = 0.33, P = .74, I2 = 0%; 2 RCTs, 2 comparisons, N = 76). For the FMA-SEC significant effects were found in both dose-matched (MD 2.73, 95% CI 1.24 to 4.21; Z = 3.59, P = .0003, I2 = 40%; 11 RCTs, 14 comparisons, N = 263) and non–dose-matched trials (MD 6.45, 95% CI 0.92 to 11.98; Z = 2.29, P = .02, I2 = 0%; 2 RCTs, 2 comparisons, N = 50). For the FMA-WH, a nonsignificant SES (MD 1.33, 95% CI −0.83 to 3.50; Z = 1.21, P = .23, I2 = 75%; 14 RCTs, 18 comparisons, N = 337) was found in the dose-matched trials, and a significant SES in the non–dose-matched trials (MD 0.96, 95% CI 0.61 to 1.31; Z = 5.39, P < .00001, I2 = 0%; 3 RCTs, 3 comparisons, N = 106). The SES for strength of the paretic arm was nonsignificant for both dose-matched (SMD 0.12, 95% CI −0.20 to 0.43; Z = 0.72, P = .47, I2 = 50%) and non–dose-matched trials (SMD 0.73, 95% CI −0.07 to 1.54; Z = 1.79, P = .07, I2 = 59%). Pooling data for basic ADL resulted in a non-significant SES for dose-matched trials (SMD 0.12, 95% CI −0.20 to 0.43; Z = 0.72, P = .47, I2 = 50%) and a significant heterogeneous SES in non–dose-matched trials (SMD 0.85, 95% CI 0.32 to 1.38; Z = 3.12, P = .002, I2 = 54%).
Muscle tone and upper limb capacity could only be pooled in the dose-matched trials, resulting in a significant homogeneous SES for muscle tone (SMD 0.26, 95% CI 0.05 to 0.47; Z = 2.46, P = .01, I2 = 28%) and a nonsignificant homogeneous SES for upper limb capacity (SMD 0.06, 95% CI −0.09 to 0.22; Z = 0.80, P = .43, I2 = 0%).
Safety
None of the included RCTs reported serious adverse events. Klamroth et al 68 described 1 mild adverse event of a patient who reported shoulder pain. After an interruption of 3 training sessions the patient completed the training program without further adverse events. In the 44 included RCTs, a total of 63 patients dropped out*: 27 in the experimental arm, 34 in the control arm, and of 2 dropouts, the trial arm was unclear. All dropouts were unrelated to the applied intervention.
Discussion
The present systematic review and meta-analysis of the effects of RT-UL in stroke patients shows that the number of phase II trials has been growing rapidly in recent years; 34 new trials were published in the past 8 years. The main conclusion is that the use of robotic devices may significantly improve motor control and strength of the upper paretic limb poststroke and appears to be safe. However, safety issues were not systematically reported in the 44 included trials and the clinical relevance of the found effects is limited. For shoulder/elbow robotics, for example, the overall significant effect on (synergy-independent) motor control reflects an average difference of about 2.15 out of 42 points (or ~5%) on a FMA-SEC score (95% CI 0.73 to 3.57). It is plausible that this difference falls within the measurement error, which has been established to be about 6 to 7 points for the FMA upper limb subscale (includes the FMA-SEC plus FMA hand). 76 Despite these improvements in motor control, no beneficial effects on upper limb capacity were found in the present systematic review. With that, the main message of this updated review concerning RT-UL poststroke is unchanged when compared to our 2008 review. 17 Although robotic technology has been developing fast over the past years, the more recent trials hardly differ from those published before 2008 in terms of content of RT-UL therapy, lack of stratification, and outcomes used. However, in line with our previous review, but in contrast to the recent Cochrane review of Mehrholz et al, 19 the present meta-analysis suggests that RT-UL may not benefit basic ADL more than usual care or no treatment.
Impact of Patient Selection and Timing Poststroke
It is important to note that the current favorable outcomes are mainly found by pooling trials that investigated shoulder/elbow robotics started beyond the first 3 months poststroke. This evidence is still lacking for robotics that target other joints of the upper limb, as well as for RT-UL interventions started within the first 3 months poststroke. This is in stark contrast with the growing evidence for increased levels of neuroplasticity in the first weeks poststroke shown in animal studies. 77 The absence of evidence for RT-UL early poststroke may relate to the arbitrary recruitment of stroke patients with respect to timing and selection within the first 3 months poststroke, resulting in heterogeneity within and between trials. Also, the broad variation of upper limb function at time of inclusion contributes to between-study heterogeneity. Baseline values of the FMA arm in the intervention groups ranged from 5.8 (SD 3.8) 43 to 50 (interquartile range 39, 58) points, 72 so patients do not seem to be prognostically comparable between trials. There is growing evidence that both appropriate selection of stroke patients with a potential for recovery at baseline78-80 and fixed timing of assessments early poststroke are fundamental for designing future RT-UL trials when started within the first 3 months poststroke.
Impact of Dosing of Robot Therapy for the Upper Limb
In contrast to our previous systematic review, positive effects of RT-UL were found in dose-matched as well as in non–dose-matched trials. However, not all robot treatment protocols of included trials were transparent in the actual amount of applied therapy. In order to ascertain that it is the type of RT-UL and not the intensity of practice that is the main driver of stroke recovery, most recent trials were dose-matched in terms of time spent in exercise therapy. In the non–dose-matched trials, the RT-UL groups on average received more therapy, varying from 1.5 to almost 5 times more time spent in therapy per week in the RT-UL group. Most non–dose-matched RT-UL trials started beyond the first 3 months poststroke, with the exception of the trial by Yoo et al. 64 It should be noted, however, that in a number of these trials the RT-UL group received RT-UL in addition to the therapy also applied to the control group,30,64 whereas in a few trials the RT-UL group exclusively received robotic therapy.32,35,40,42,51 Ideally, dose-matched trials should be conducted in which the amount of total therapy time as well as intensity in terms of number of repetitions are standardized.
Impact of Robot Type
A worldwide accepted classification for rehabilitation robotics is lacking. In line with lower limb robotics, 81 our sensitivity analyses investigating subgroups showed a trend favoring the use of end-effector devices. However, our review also confirms that the effects of the technologically more complex and expensive exoskeleton devices have been insufficiently investigated. 15 Therefore, our conclusions with respect to robot type should be interpreted with caution, since the statistical power of all trials was low, except in 1 phase III trial. 48 At least it can be concluded that pragmatic phase III and cost-effectiveness (phase IV) trials, including equivalence trials, on RT-UL are justified.
Looking at the joints that were targeted by RT-UL, showed that with 21 trials, shoulder/elbow robotics was the most investigated type. For the robots that targeted other joints, low numbers of small sized trials were available. With that, pooling was possible for only a limited number of robot types, resulting in underpowered analyses. With that, the superiority of a specific type of robot remains unclear. However, it might be suggested that the principle of specificity of motor learning also applies to RT-UL, as shoulder/elbow robotics improve FMA-SEC scores, but not FMA-WH scores when compared to non-robotic treatment. Confirmation by pragmatic RT-UL trials, such as wrist/hand robotics, is needed.
Looking Back
Despite the progress made in robot technology in the past 8 years, our findings on the effects of RT-UL are in line with the conclusions of our previous systematic review. 17 Foremost, we report a positive trend on motor recovery for RT-UL compared with conventional therapy. Our findings are also in line with the results reported by Mehrholz et al. with exception of outcome of ADL. 19 In contrast to our results, Norouzi et al. 18 did not find any effects of RT-UL on motor recovery in their review after pooling dose-matched trials. However, our review included more RCTs. Also in line with previous systematic reviews,17-19 the present meta-analysis shows heterogeneity among the included trials regarding the effects of RT-UL. Even when controlling for moderating factors such as robot type, treatment intensity, and target of RT-UL, the heterogeneity between trials remained. In our opinion, the relatively large variation in claimed effects between trials is mainly caused by differences in the selection of patients at baseline, regarding characteristics that determine functional prognosis of the upper limb poststroke. Moreover, recent prognostic cohort studies have shown that outcome of upper limb capacity is highly predictable in the first days poststroke.5,82 Consequently, the probability of achieving meaningful change in arm/hand capacity is mainly determined by selecting patients with at least some minimal voluntary control of wrist and finger extension at baseline.17,19
Limitations
The present review has some limitations. First, some studies were conducted at the same time and at the same place, so we cannot be 100% certain that the study populations used in trials were all unrelated.32,35,40,42,54,55,57,69 Second, no distinction was made between different types of control interventions, which may influence the significance of the effects of the intervention of interest. Third, we focused on a few main clinically meaningful outcomes targeted by RT-UL, and ignored findings regarding movement kinematics that may be more responsive for change. 83 Fourth, we could not obtain data from 2 conference abstracts.84,85 Fifth, we combined the postintervention ARAT and WMFT scores in order to determine the summarized effects for upper limb capacity as it is suggested that they measure the same construct. This assumption was mainly based on the high concurrent validity between both assessments in which Spearman’s rank correlation coefficient ranged from 0.70 to 0.86. 86 However, the high concurrent validity does not exclude that both tests measure two closely related constructs. Additional separate meta-analyses for the ARAT and WMFT showed no significant effects in favor of RT-UL. Sixth, as a reflection of intensity, we defined treatment contrast as time spent in RT-UL. We acknowledge that the number of repetitions performed is a more accurate reflection of intensity. Seventh, study selection was done by 1 reviewer; however, we checked relevant reviews such as that of Mehrholz et al 19 for possible relevant trials we missed. Finally, this systematic review focused on the added effects of RT-UL alone, and ignored the impact of type of visual feedback and games as integral parts of the RT-UL. There is growing evidence that the combination of virtual reality training, including gaming, when combined with RT-UL, may enhance treatment effects. 87 In particular, the interaction between virtual reality training and RT-UL in 1 training system may make it more challenging for patients and increase their compliance to practice intensively. 12 With that, RT-UL should not be seen as a stand-alone therapy, but should rather be integrated in a comprehensive treatment package to optimize motor performance. In the future, this neglected combination treatment should be taken into consideration when designing protocols for RT-UL trials.
Implications for Research
The present systematic review and meta-analysis has some major implications for further research in the field of robot-assisted therapies of the upper limb. First, the current review underlines the need for clear definitions and a common classification of rehabilitation robotic devices. This will allow meaningful comparisons between the growing numbers of devices developed in stroke rehabilitation. This classification should preferably be based on the amount of support (ie, active vs passive devices), type of support, (ie, whether the patient or the robot is in charge of controlling motor performance), as well as the number of joints (or segments) involved, and hence, the degrees of freedom that the device controls.83,88
Second, the present review shows a lack of transparency in investigated robot-assisted treatment protocols in terms of the actual dosing of therapy and type of robot control (ie, active control, passive control, or assist-as-needed control). These treatment protocols should be based on fundamental knowledge about motor learning and the use of compensatory strategies in daily activities during the different recovery phases poststroke.88,89 This requires distinguishing between on the one hand behavioral restitution of neurological impairments as reflected by significant and clinically relevant changes in FMA arm scores, and on the other hand behavioral substitution, that is, learning to use adaptation strategies to accomplish meaningful tasks.88-90 For example, several cohort studies have shown that FMA arm scores are invariant after 3 months poststroke.6,83,89,91-94 This raises the question whether the FMA arm score is a responsive measure of outcome in RT-UL trials starting beyond this time window. 88 It should also be acknowledged that improvements in the quality of motor performance of the upper limb beyond the first 3 months are mainly driven by optimization of end-effector control (ie, compensation strategies) rather than by behavioral restitution of neurological impairments (ie, true neurological repair).83,90,93 With exception of the FMA arm scores that reflect the degrees of freedom that patients are able to control as a measure of true neurological repair, 83 clinical outcomes are not able to distinguish between behavioral restitution and use of compensation strategies. With that, quantifying kinetics and kinematics is the only way to understand what patients exactly learn when measuring improvement in quality of motor performance.83,90,95
Third, future high-quality trials should include kinematics (mostly measured by the robot itself) as a more responsive outcome measure to monitor therapy-induced improvements poststroke. 95 A recent study also suggests that the improved control is not restricted to the upper paretic limb alone, but may generalized to the non-affected limb in chronic stroke. 95 Longitudinal changes in kinematics allow a better understanding of how therapy-induced improvements emerge.83,96 For example, Reinkensmeyer et al 97 showed in their sample of 27 chronic stroke patients that motor recovery follows an exponential time course of learning, irrespective of robot support. They derived a neurocomputational model, which showed that the time course of this motor recovery curve depends on patients’ initial ability to activate the unaffected portions of the damaged corticospinal tract system. 97 Similarly, neurocomputational simulation models based on findings from the Extremity Constraint-Induced Therapy Evaluation (EXCITE) trial suggest that motor recovery by constraint-induced movement therapy requires a minimal motor performance threshold to induce meaningful improvements. 98 One may assume that this minimal threshold in upper limb motor performance is also relevant for selecting patients for RT-UL trials.
Fourth, also in trials in this updated review, no beneficial effects of RT-UL on upper limb capacity could be demonstrated. The question is why a transfer of effects found on the body function level is lacking. One possible reason is that in current RT-UL the robot is often in charge and mostly focused on impairments, without allowing compensation strategies. 88 What remains unknown with the currently applied upper limb capacity outcome measures is the effects of RT-UL on real-world use of the affected arm. Applying activity monitoring would be one of the solutions to gain insight in the generalizability of RT-UL beyond therapy time. Unfortunately, the only trial in this review that applied miniaturized wearable sensor technology such as accelerometry could not detect significant differences between groups. 73
Fifth, investigating and reporting on safety issues should be an integral part of future trials in RT-UL. The safety reporting should include the medical certification of the used robot-device in the method section of each published robot trial.
Finally, next to achieving consensus on used definitions and baseline characteristics, there is an urgent need for consensus on used outcomes making RT-UL trials comparable for their evidence. 99 In light of improving future phase II/III robotic trials, a better selection of patients who are likely to recover as well as fixed timing of baseline assessment early poststroke are key factors for designing rehabilitation trials early poststroke.91,100 Preferably, patients should be stratified depending on the intactness of corticospinal tract system (ie, clinically identified by some voluntary control return of finger extension in the first days poststroke).79,80,100 and the likelihood of showing spontaneous neurobiological recovery in the first days poststroke.78,79
Conclusion
The current review confirms that RT-UL in stroke rehabilitation for the shoulder/elbow and wrist/hand improves synergy-independent motor control of the shoulder/elbow and wrist/hand, respectively. However, the overall effects are small and did not exceed the values of the minimal clinical important differences of the FMA arm scores. We were not able to show evidence for an added value of RT-UL in terms of upper limb capacity when compared to usual care. These findings are similar to that of our previous systematic review, published in 2008. 17 The significant overall effect on motor control was found for those studies that started beyond the first 3 months poststroke, whereas evidence for early-start RT-UL trials starting within the first 3 months is lacking. This finding contrasts with the well-established time window for increased levels of (homeostatic) brain plasticity shown in animal studies. 77 Although the evidence for an interaction effect of time poststroke with effects of RT-UL is unclear, the present study shows that the number of high-quality trials in this early time window are scarce. Furthermore, none of the studies involved a dose-response trial of RT-UL, and cost-effectiveness trials are almost completely lacking. 48 More importantly, in dose-matched trials, we found no evidence that the type of RT-UL is an important factor in the outcome. This suggests that RT-UL with more expensive devices, which are also often able to control more degrees of freedom, is not necessarily better than RT-UL with less expensive devices. This surprising finding may reflect the fact that change in degrees of freedom is mainly restricted to the first 3 months poststroke,83,95 whereas there have been hardly any sufficiently powered trials in this critical time window aimed to improve quality of motor performance. In fact, trials that really investigated the impact of RT-UL on kinematics and with that, quality of motor performance, still have not been undertaken in the early phase poststroke.95,101 In our opinion, elucidating the impact of the type of RT-UL on the quality of motor performance requires (a) better understanding of what and how stroke patients learn to accomplish meaningful tasks poststroke, 89 (b) trial designs with a sufficient treatment contrast that are better able to determine the possible added value of robotics on the natural time course of spontaneous neurobiological recovery early poststroke78,79 by aligning intervention timing with respect to stroke onset, and (c) better stratification of patients with a comparable prognosis early poststroke.79,94 The consequence of the recommendations in this systematic review make that designing and testing upper limb robotics is preeminently interdisciplinary. This translational research requires from the outset of an integrative collaboration between bioengineers, designers, neuroscientists, clinicians (ie, physicians, physical therapists and occupational therapists), clinical epidemiologists (ie, clinical trialists) as well as the end-users (ie, stroke patients) themselves. With that, this review suggests that this translational research in rehabilitation robotics poststroke is still in its infancy.
Footnotes
Acknowledgements
The authors thank Hans Ket for his cooperation in the literature search and Jan Klerkx for his support in translation and editing this manuscript.
Authors’ Note
The funders of this study had no role in design, conduct, data collection, data management, data analysis, data interpretation, or preparation, of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: GK has received grants from The Dutch National Institutes of Health (NWO); the Dutch Heart Foundation; The Royal Dutch Society for Physical Therapy (grant number 8091.1); The Dutch National Institute of Health (ZonMw) for the EXPLICIT-stroke trial (number 89000001) and the European Research Council, for an ERC-advanced grant (number 291339-4D-EEG). EEHvW has received grants from the Dutch National Institutes of Health (ZonMw), the Dutch Technology Foundation STW, and a fellowship from the Dutch Brain Foundation (Hersenstichting Nederland). CGMM received a fellowship from the Dutch Brain Foundation (Hersenstichting Nederland) and received grants from the Dutch National Institute of Health (ZonMW) and the Dutch Technology Foundation STW.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
