Abstract
A human–robot interactive control is proposed to govern the assistance provided by a lower limb exoskeleton robot to patients in the gait rehabilitation training. The rehabilitation training robot with two lower limb exoskeletons is driven by the pneumatic proportional servo system and has two rotational degrees of freedom of each lower limb. An adaptive admittance model is adopted considering its suitability for human–robot interaction. The adaptive law of the admittance parameters is designed with Sigmoid function and the reinforcement learning algorithm. Individualized admittance parameters suitable for patients are obtained by reinforcement learning. Experiments in passive and active rehabilitation training modes were carried out to verify the proposed control method. The passive rehabilitation training experimental results verify the effectiveness of the inner-loop position control strategy, which can meet the demands of gait tracking accuracy in rehabilitation training. The active rehabilitation training experimental results demonstrate that the personal adaption and active compliance are provided by the interactive controller in the robot-assistance for patients. The combined effects of flexibility of pneumatic actuators and compliance provided by the controller contribute to the training comfort, safety, and therapeutic outcome in the gait rehabilitation.
Keywords
Introduction
Robot-assisted gait rehabilitation solutions, as therapeutic adjuncts to facilitate clinical practice, have been demonstrated the effectiveness for patients suffering from neurological impairments such as stroke and spinal cord injuries. 1 –3 Gait rehabilitation training robots can be grouped as end-effector driven and joint driven robots according to their mechanisms. 4 For the end-effector driven robots (e.g. Gait Trainer, 5 Haptic Walker, 6 and LokoHelp 7 ), only the patient’s soles are fixed to and guided by the robot’s foot plates. Joint driven type (e.g. Lokomat, 8 Lower extremity powered exoskeleton (LOPES), 9 Active leg exoskeleton (ALEX), 10 Hybrid assistive limb (HAL), 11 and Lower extremity rehabilitation robot (LERR) 12 ) has leg joints that match those of the patient’s legs. Actuators are placed at the joints of the robot control joint motions to mimic normal walking patterns, providing the synchronized movement for the lower extremity joints of patients. Therefore, the wearable robot of multi-joint driven type is most suitable for patients in the gait rehabilitation training because of the physical coupling between them. However, there exist motion coupling and interaction between exoskeleton mechanism and human lower limbs. The wearer and exoskeleton form a closed loop as a human–robot cooperation system. Human–robot coordination is no longer a master–slaver relationship but a partnership of mutual perception and cooperation. Subsequently, the control strategy for the cooperation system develops from trajectory tracking control to interactive control, such as compliant control, patient-cooperative control, assist-as-needed (AAN). 13 Interactive control based on perceiving patients’ motion awareness is a hot and difficult research topic of rehabilitation training robots.
Literature has suggested that patients should be encouraged to participate actively and the role of rehabilitation robots should be to only scaffold or support when it is required for an effective treatment. 14,15 From this point of view, interactive control suggests that the robots only supply as much effort as a patient needs to accomplish rehabilitation training tasks by assessing his or her performance in real-time 16,17 ; whereas, trajectory tracking control is primarily used for early stage rehabilitation when patients have very weak muscle strength.
Assistance from robot is expected to be intelligently adjusted according to the patients’ physical conditions and efforts in rehabilitation, so as to encourage their voluntary participation. There is energy transfer between patients and rehabilitation robots in the interactive system, which is represented by the dynamic coordination of position and force. Three groups of interactive control schemes are widely used. They are hybrid force–position control, impedance control, and admittance control.
18
Hybrid force–position control takes into consideration the joint trajectories and interaction forces between the wearer and exoskeleton. It splits the task space into two complementary subspaces using a selection matrix. A position control strategy is implemented in one of subspaces and a force control in the other. For example, in the control of the ALEX,
10,19
the interaction forces between the wearer and ALEX are measured through two force/torque sensors mounted at the thigh and shank. The basic idea of impedance control is to regulate the dynamic relation between the wearer and exoskeleton by relating the position error (e.g. joint angles) to the interaction force/torque through mechanical impedance. For instance, in the control of Lokomat,
20,21
the outer-loop is composed of position feedback and the inner-loop is of force feedback. The adjustable range of the reference trajectory is established according to the patient’s joint angle information. Correcting force/torque is generated by the angle error by impedance algorithm for the robot. The stiffness parameter
A new adaptation law is proposed in this study. Reinforcement learning is introduced into the parameters adjusting process. Based on the learning algorithm, an admittance model including patient’s personal characteristics is established for the interactive control in active rehabilitation training. Our group developed a pneumatic wearable lower limb exoskeleton robot. Pneumatic actuators provide a certain degree of passive compliance, while the proposed admittance control algorithm based on reinforcement learning provides active compliance by adapting robot behavior to patients. The combined effects of flexibility of actuators and compliance of control can improve the training comfort, safety, and therapeutic outcome in the gait rehabilitation. The robot is capable of providing assistance at low compliance level to severely impaired subjects and can adapt the compliance to an increased level for subjects with less severe impairments.
Pneumatic driven gait rehabilitation training robot and force/position detection
The gait rehabilitation training robot consists of a weight-reducing suspension device, a weight support mechanism, a hip adjustment mechanism, two exoskeletons of lower-limb, and a treadmill, 25 as shown in Figure 1(a).

Structure scheme of gait rehabilitation robot: (a) mechanism layout and (b) human–robot contact force detection device installation in robot.
Exoskeleton assists patients and coordinates with the treadmill speed to complete repetitive gait rehabilitation training. The two actuated leg exoskeletons are attached to the patients’ legs. Based on the bionic idea, the unilateral lower extremity exoskeleton is a two-degree-of-freedom (2-DOF) linkage mechanism with a hip joint and knee joint rotating in sagittal plane considering that human walking motion is mainly in the sagittal plane. The linkage mechanism is actuated by pneumatic cylinders to rotate around the axis of each joint to induce flexion and extension movements to realize the organic combination of rigid support and pneumatic flexible drive. In addition, the weight-reducing suspension mechanism partially reduces the load of patient’s lower limbs and balances the height change of the center of gravity during walking.
The linear potentiometer is installed on each cylinder and the joint angle is calculated by the geometric relationship of the mechanism. A human–robot contact force detection device is installed on the thigh linkage of the robot, as shown in Figure 1(b), which consists of inner ring, outer connecting block, and symmetrically arranged film pressure sensors (FSRs). The inner ring which is tied to the lower limb of the patient moves with the affected limb, while the outer connecting frame is fixed on the robot’s thigh. The inner ring convex mechanism is matched with the groove on the outer connecting block, and the FSR sensors are installed inside. There are two FSR sensors installed in the front and back sides, respectively. When the gait rehabilitation training robot is not synchronized with the human lower limb movement, the human–robot contact force in the sagittal plane is measured by the FSRs in real time.
Interactive active compliance control strategy based on admittance model
Determination of human–robot interactive model—Admittance model
Seen from the instantaneous power flow between two or more physical systems, physical systems come in only two types: admittances, which accept effort (e.g. force) inputs and yield flow (e.g. motion) outputs; and impedances, which accept flow (e.g. motion) inputs and yield effort (e.g. force) outputs. 26 For two physical systems in contact with each other, the emphasis of their fusion characteristics is mutual compensation: if the environment is impedance, the robot should show admittance characteristics and vice versa. The patient is regarded as the environment to robot in the rehabilitation task. When the patient has active walking consciousness, the affected lower limb has certain movement ability. The uncoordinated movements between the robot and the patient exert the contact force, so the environment has impedance characteristics. Further, the design of the robot controller should eliminate the contact force, takes the contact force as input and reacts as the motion adjustment. Therefore, the interactive model between the two physical systems (e.g. the patient and the robot) is of admittance characteristics, as shown in Figure 2. It can be seen that there is no force closed-loop in interactive control. The compliance in interaction is embodied in the dynamic relationship of the force and the position, that is, the characteristics of the admittance model.

Human–robot interaction in gait rehabilitation training.
The output of admittance model is the adjustment of position denoted as △
The admittance model is the dynamic relationship between the human–robot interaction force
where
The admittance model transfer function (3) is obtained through the Laplace transformation from equation (2)
It can be seen from formula (3) that

Effects of admittance parameters. (a) Effect of stiffness coefficient
From the simulation results, we can see that stiffness coefficient
Admittance parameter adjustment algorithm based on reinforcement learning
In active rehabilitation training, the adaptive parameters (
Sigmoid function is used in the design of adaptive law. It is continuous, mono-incremental, and bounded in the definition domain (−∞, +∞), and the range is from 0 to 1, as shown in Figure 4(a). It is defined as

Functions’ graphic representation: (a) sigmoid function
Based on Sigmoid function, the used function is
where the parameter
An adaptive law is proposed to tune the admittance parameters based on equation (5), and the adjusting values of
In equation (6),

Adaptive admittance parameters adjustment method based on reinforcement learning.
Because the admittance model describes the dynamic relationship between position and force, the choice of learning parameters is related to them. Joint angle error (1) State space
Joint angle error
(2) Action set
Action set
(3) Policy π and probability of state transition
Sarsa(0) is an on-policy control method that attempts to evaluate or improve the same policy that is used to make decisions. The
where
(4) State action-value function
It is easy to devise incremental formula for updating state action-value function as follows
where
where
(5) Adjustment of the parameters:
An incremental formula of the adjustment coefficients
where
Interactive control strategy of independent joint
In the early stage of rehabilitation, the psychological bottleneck of patients that they dare not walk should be overcome by passive rehabilitation training. After this stage, patients enter the recovery stage. Then, they will be intended to participate actively. During this period, the patient’s participation movement generates the contact force as an input for the robot from the external environment. The robot with the varying-parameter admittance controller can adjust its behavior by the interactive force feedback and move near the reference gait trajectory to achieve active compliance rehabilitation training. In addition, the admittance parameters are appropriate to the patient based on the reinforcement learning adjustment algorithm. The interactive control strategy of independent joint based on an adaptive admittance model is shown in Figure 6.

Interactive control strategy of independent joint based on an adaptive admittance model.
The control scheme consists of a position controller implemented in joint space and an admittance controller in interaction space. Trajectory tracking of the robot could be achieved by controlling individual cylinder displacement in joint space. The cylinder is controlled by a pneumatic proportional valve. In consideration of the time delay and nonlinear characteristics of the pneumatic system, an improved Proportion-Integral-Derivative (PID) controller with the dynamic feedforward algorithm is designed. The dynamic feedforward algorithm consists of a velocity feedforward (VFF) algorithm and a gravity compensator (GC), as shown in the following formula (11)
where
Simulation and analysis
Simulations are conducted to verify the feasibility of the interactive control system with the hip and knee joint angle data in Clinical Gait Analysis (CGA) 28 database as the reference trajectories. The database is established by capturing a large number of motion information of normal people during walking by 3-D Motion Capture System of Northern Digital Technologies Inc (Ontario, Canada). (NDI). Each group of motion information includes 50 equal interval joint angle sampling points in a complete gait cycle of hip, knee, and ankle joints. The interactive control system is presented in Figure 7.

Interactive control system for pneumatic-driven gait rehabilitation training robot. GC: gravity compensator; VFF: velocity feedforward.
Simulation models and parameters
Each leg of the exoskeleton robot is a two-link serial manipulator, which consists of two independent rotating joints, hip and knee, actuated by a pneumatic proportional valve-controlled cylinder system separately. Two pneumatic proportional valves (Festo (Germany), MPYE-5-M5-010-B) are utilized to control the hip cylinder (SMC, CM2C32-125Z) and knee cylinder (SMC, CM2B32-80) with the pressure of 0.4 MPa. Based on the mathematical model of valve-controlled cylinder system,
29
the transfer function between the control voltage
Also the mathematical model of knee valve-controlled cylinder system is calculated as follows
The geometric relations between joint angle
Each leg of the gait rehabilitation training robot is a 2-DOF mechanism. Its dynamic model is as follows
where subscript “1” represents hip joint parameters while subscript “2” represents knee joint parameters,
The parameters of the robot in equation (16) are obtained by ProE software (Pro/Engineer WildFire5.0), and the parameters of the lower limbs of the human body are gotten from the national standards of the People’s Republic of China: Inertial parameters of adult human body (GB/T 17245-2004) and Human dimensions of China Adults (GB 10000-88). Taking the exoskeleton robot worn by patients with 1.75 m height and 70 kg weight as an example, the dynamic model parameters are listed in Table 1.
Human–robot system dynamic parameters.
Simulation and analysis
(1) Validity of admittance parameters learning algorithm based on Sarsa(0)
The parameters of the reinforcement learning algorithm are set by trial and error, where

Learning processes of admittance parameters based on Sarsa(0). (a) 10 times learning of hip joint, (b) 50 times learning of hip joint, and (c) 50 times learning of knee joint.
Comparing the results of 10-time (Figure 8(a)) and 50-time parameters learning (Figure 8(b)), we can see that with the increase times of learning, the damping parameters and stiffness parameters gradually tend to be stable. It takes about 0.032 s to complete 50-time learning sessions. Figure 8(b) and (c) indicates that the admittance parameters are smaller when the human–robot contact force is large. The value of
(2) Comparison of Sarsa(0) learning algorithm and non-learning admittance control algorithm
Under the same conditions of the time-varying human–robot contact force, the effects of learning and non-learning control algorithms on system performance are compared, as shown in Figure 9.

Comparison of Sarsa(0) learning algorithm and non-learning algorithm.
With fixed settings of
Comparison of two algorithm in admittance parameters adjustment.
It can be seen that the adjustment range of
(3) Personalized features in reinforcement learning algorithm
Another contribution of reinforcement learning is that the model reflects the patient’s own personalized characteristics by learning with the patient’s sample data. Each patient needs different treatment schemes because of their altered condition. The joint angle and human–robot contact force are selected as two state variables of this algorithm, both of which have personalized characteristics of patients who are training.
As shown in Figure 10, under the same time-varying human–machine contact force (curve in lower left corner), different admittance parameters are obtained by Sarsa(0) algorithm learning with two groups of different CGA data (curves in lower right corner). The admittance parameters obtained by CGA2 group with large joint angle are relatively smaller. From the aforementioned analysis, these lead to the larger value of angle adjustment and the faster adjustment speed. Therefore, for different patients, the influence of joint angle during walking can be reflected in learning algorithm for the parameters of the controller, which has personalized characteristics of themselves.

Admittance parameters learning curves with different CGA data. CGA: clinical gait analysis.
Experimental results
The interactive control experiments are conducted on the prototype of gait rehabilitation training robot (Figure 11) under two modes of passive rehabilitation training and active rehabilitation training.

Experimental system of the rehabilitation training: (a) human–robot contact force detection device and (b) gait rehabilitation training robot prototype and tester.
Experimental system and protocol
At present, it is in the prototype stage. A healthy person (male, 26 years, 1.75 m height) participated in this study. Before the training, he wore the exoskeleton robot prototype, which was adjusted in accordance with the length of his lower limbs and tied the hip and knee joints together with a strap. The inner ring of the human–robot contact force detection device we developed closely surrounded the tester’s thigh. Along with the walking, the human–robot contact force was measured in real time. And two linear potentiometers (KTM-125 L (Taizhou Jiaojiang Xiyu Electronics Factory, Taizhou, China), KS15-150-05 (Jinan Kaisi Technology Co., Ltd., Jinan, China)) installed on the hip cylinder (SMC (Japan), CM2C32-125Z) and the knee cylinder (SMC, CM2B32-80) measured the displacements to calculate the corresponding joint angles of the robot. The measurement and control system was constructed by the semi-physical real-time system provided by the LinksRT company (Beijing LINKS Technology Co., Ltd., Beijing, China), including an A/D board (Ni6259) for analog input and a D/A board (Ni6216) for digital output for the pneumatic proportional valves (Festo, MPYE-5-M5-010-B). The pneumatic system was supplied with the pressure of 0.4 MPa.
The treadmill speed was set to 0.8 km/h in active rehabilitation training experiment. Sensors data for all experiments were collected at 100 Hz. The hip and knee sagittal plane physiological gait data in CGA database were used to define the reference joint angle trajectories.
Passive rehabilitation training experiment
The treadmill was stationary in passive rehabilitation training experiment. The right leg of the tester stood on the treadmill as a support. The left leg is worn in the exoskeleton, and it is lifted up, driven entirely by the robot, which simulated the patient’s passive rehabilitation training, that is, the left lower limb of the human body had no initiative. The gait cycle is 5.6 s. This experiment completed 10 gait cycles of passive training. The parameters for position controller (formulas (11)) in the experiment are given in Table 3.
Position controller parameters.
The synchronous detection of the cylinder displacements

Experimental curves of passive rehabilitation training.
The measured human–robot contact force is very small in passive rehabilitation training. In preparation stage during 0–10 s, the exoskeleton robot drives the lower limbs of the tester from the upright state to the leg-lifting state. The joint motion angle varies slowly and the human–robot contact force is only about 1 N during this period. Then in the normal gait cycle, the human–robot contact force slightly increases due to the large motion range of joint. The range of human–robot contact force in the experiment is (0.81, 2.98) N. Because the purpose of passive rehabilitation training is to let the affected limb walk in accordance with the predetermined trajectory without active consciousness, there is no force outer-loop in the control strategy and only the inner-loop position controller acts. From the measured curves of displacement, it can be seen that the time lag characteristics of the pneumatic system are suppressed by velocity feedforward of the position controller. Thus, the cylinders’ displacements are synchronized with the desired trajectories in each gait cycle. However, there are always some errors and oscillations in the peak value of the hip joint. The time for arrival of the peak value is just the transition stage from the joint extension to flexion, and the cylinder is required to change direction. Therefore, there are a lot of uncertain factors such as friction and impact when cylinder is reversing. The friction moment is not compensated in the control strategy. Another reason is that the robot hip joint bears the weight of the human lower limbs and exoskeleton mechanism, while the gravity compensation coefficient is a fixed value. The above factors lead to the errors of the peak value. Although the knee joint has the above problems, the error is less than that of the hip joint because of its smaller load. Therefore, the passive rehabilitation training experiment verifies the effectiveness of the inner-loop position control strategy, which can meet the demands of gait tracking accuracy in rehabilitation training.
Active rehabilitation training experiment
In the active rehabilitation training experiment, the left leg of the patient was no longer suspended, but the lower limbs walked on the treadmill. When the treadmill speed is set to 0.8 km/h, the gait cycle of human walking is 4 s. The reference trajectories of the hip and knee cylinders and the control parameters of the position controller are consistent with those set in the passive rehabilitation training experiment. The measured data from the beginning to the four gait cycles are selected for analysis purpose, presented in Figure 13.

Experimental curves of active rehabilitation training.
The tester’s initiative leads to the increase of human–robot contact force in active gait rehabilitation training. In the preparation stage when the exoskeleton robot drives the lower limbs from the upright state to the leg-lifting state, the human–robot contact force is only (0.61, 2.15) N, because the joint motion angle varies slowly as the passive training. With the beginning of the normal gait training, the human–robot contact force increases, and the range of human–machine contact force is (−26.21, 11.06) N. The human–robot contact force increases significantly compared with the passive rehabilitation training process because of the active participation of the tester. The purpose of active rehabilitation training is to encourage patients to participate actively with the robot providing part of the help and adapting to human movements to make the human body feel comfortable. Next, the compliance of the robot is verified with the interactive control strategy, with
Dynamic admittance parameters are the key factors for trajectory adjustment to realize the compliance of the robot. From the recorded experimental curves of active rehabilitation training (Figure 12), it can be seen that the human–robot contact force in the initial stage (0–7.3) s is small, so the adjustment parameters are no longer presented. In the normal four gait cycles from 7.3 to 23.3 s, the admittance parameters changing processes based on reinforcement learning algorithm and the hip joint angle adjustment are recorded as well as the measured interaction force, as shown in Figure 14.

Adjustment curves of admittance parameters and the hip joint angle trajectory.
Seen from the angle curve of hip joint, the desired angle Parameters In the second gait cycle, the contact force is the largest and the adjustment of the angle is also the largest. With the better cooperation of the human and the robot in the third gait cycle, the adjustment of the angle is also reduced, which shows that the robot dynamically adapts to the lower limbs of the tester who has the active participation. There are two peaks of contact force in each gait cycle. When the contact force is negative, it indicates that the lower limbs of the tester lag behind the robot, then the angle adjustment curve Comparing the measured displacement
Discussion and conclusion
Whether impedance control or admittance control is employed in the process of human–robot interaction, the self-adaptation of model parameters is the focus of current research. The self-adaptation is mainly realized by discrete selection and continuous adjustment. For example, a velocity-based impedance parameter selection method was proposed by Erden and Mari.
30
The velocity threshold and the corresponding damping parameter empirical values were chosen according to the experience. The principle of the variable impedance control strategy was simple, which reflected the adaptability adjustment of the model parameters to a certain extent. However, the discrete impedance parameter reduced the controllability and fluency of operation. To achieve continuous model parameter adjustment, researchers began to infer human operation intention by monitoring the motion of the manipulator in the process of human–machine interaction. For example, in the control of Lokomat,
20,21
real-time monitoring of joint angles was adopted, and the stiffness and damping values of the impedance controller were continuously adjusted using the function between the joint angle and the stiffness parameter, as well as the relationship between stiffness and damping. In control of CARR robot,
23
the
Most of the adaptive impedance or admittance interactive control strategies mentioned above adopt the method of constructing functions to adjust the parameters without considering the individualized characteristics of patients or manipulators in the interactive process. The parameters in the function depend largely on the designer’s subjective intention or the practical experience of a specific task. Although the robot can dynamically adjust with the interaction information, it is more flexible than the interactive control under fixed parameters; it needs to reset the parameters to the function when facing different patients to provide more natural rehabilitation training experience for them. Manual modification of parameters in functions requires experience and a certain amount of time to explore. It is also inconvenient for computer control.
To overcome the abovementioned limitations, the interactive control based on reinforcement learning is proposed in this study. The interactive control strategy of rehabilitation training robot is conducted by the two-loop nested control structure of interaction space and joint space to realize two training modes: passive rehabilitation training and active rehabilitation training. The joint space control strategy adopts the independent joint decentralized control algorithm. The feedforward compensation of dynamic characteristics is added, which improves the performance of the PID controller in consideration of the nonlinearity and modeling error of the pneumatic system. An adaptive admittance model is used in the interactive control strategy to stimulate the training initiative of patients. The parameter adaptive law is designed using the deformation formula of sigmoid function to meet the needs of a dynamic and time-varying interaction process. In addition, a personalized parameter learning method based on reinforcement learning is proposed. A 2-D mesh of state variables is constructed by discretizing human–robot contact force and joint angle errors, so the personalized characteristics of patients are quantified in the learning algorithm. Individualized admittance parameters suitable for the patient are obtained by reinforcement learning. The proposed techniques in this study contribute to personal adaption and active compliance in robot-assist rehabilitation training. To the best of the authors’ knowledge, the interactive control based on reinforcement learning with the pneumatic driven lower limb rehabilitation training robot has not been reported in literature.
The next step is to realize the active resistance rehabilitation training. Instead of setting the expected human–robot contact force to be zero, the desired force is given according to the patient’s need. Through the admittance model, the stable contact force is always needed to be overcome to provide resistance for the patient’s rehabilitation training. At the same time, clinical experiments will be carried out as soon as possible to further optimize the design of the system.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of following financial support for the research, authorship, and/or publication of this article: This study was supported by the “Research on key generic technologies of pneumatic gait rehabilitation training robot” project (Grant No. 172102210036) and “Research on key technologies of human–robot harmony for gait rehabilitation training robot” project (Grant No. 192102210065) granted from “Project of science and technology of the Henan Province.”.
