Sage Journals: Discover world-class research

Abstract

Aiming at the application of robots in service, medical treatment, rehabilitation, and other fields, a humanoid cable-driven hybrid robot by imitating the structure of human arm is designed in this article. The robot is composed of a cable–spool–pulley system, series mechanism, and coaxial spherical parallel mechanism, which can achieve six degrees of freedom movement in space. A challenge with cable driving is that the movement of the rear end joints (such as pitch and roll) can alter the length or tension of the cable driven by the frontend joints, resulting in joint coupling. This interference can lead to a decrease in the motion accuracy of the robotic arm. In addition, it also affects the durability of cables or mechanical components such as bearings and pulleys. Considering the joint motion coupling phenomenon caused by the cable-driven, the decoupling method is proposed, and the kinematic model of the robot is established. To solve the nonlinear coupling characteristics and the uncertainty of the dynamics parameters, a controller is proposed for the humanoid cable-driven hybrid robot, combining proportional-integral-derivative (PID) control based on decoupling method (DC-PID) and the double delay deep deterministic policy gradient (TD3) deep reinforcement learning algorithm. The trajectory tracking of the end-effector position and orientation are controlled by using DC-PID and TD3. And the simulation results show that the proposed control method has good trajectory tracking and convergence performance according to trajectory tracking error and training reward. Finally, the humanoid cable-driven hybrid robot prototype is developed. The experimental results of coupling compensation show that the average maximum error of the joint is reduced by 77.58% after considering the coupling compensation of the joint. The results of the controller validation experiments show that the DC-PID controller reduces the maximum error in each axis by 11.54%, 35.29%, and 40.16%, respectively, compared to the open-loop experiments.

Keywords

Cable-driven hybrid link joint motion coupling DC-PID control TD3 deep reinforcement learning

Introduction

With the deepening of robot application, the traditional rigid robot gradually shows disadvantages due to the problems of low load–weight ratio, low energy efficiency, lack of flexibility, and so on when it comes to the requirements of medical treatment, rehabilitation, human–machine cooperation, and other fields. Therefore, it is necessary to innovate the robot structure. Cable-driven^1,2 is a robot drive method proposed in the late 1980s. Flexible cable (such as wire rope, and so on) has good flexibility and elastic buffer characteristics, and cable-driven can also provide a certain degree of feedback, allowing individuals to sense changes in the external environment during the interaction with the machine, thereby adjusting their movements and posture more flexibly and improving the safety and efficiency of human–computer interaction.^3
–5 Compared with the traditional connecting rod hinge drive, the motor and the robot joint rely on the flexible cable transmission force and torque, break through the hinge angle and telescopic length limit, so the motor and speed reducer can be installed outside the robot body, the inertia of moving parts can be greatly reduced and improve the dynamics performance and response speed of the motor and the driver. The end-effector is allowed to obtain higher speed and acceleration,^6,7 which can achieve higher working efficiency than the traditional rigid robot.

The cable-driven robot is complementary to the traditional rigid robot, which expands the application prospect of the robot in service,^8,9 medical rehabilitation,^10
–12 surgical treatment,¹³ and so on. Lalithkumar¹⁴ et al. investigated a flexible cable-driven single-hole surgical robot with a spring backbones and tubular luminal restraint; used a master-slave structure with tendon actuation to achieve robot function; and improved tendon routing, anchoring, and fixation based on the principle of cable actuation for the spring skeleton bending method. Li¹⁵ et al. studied a cable–pulley transmission mechanism for surgical robot with back-drivable capability. Low friction and back-drivable, compared to conventional non-back-drivable mechanisms based on gear coupling, is achieved by means of differential cable driven method. KoreaTech introduced LIMS2-AMBIDEX,¹⁶ a seven-degrees of freedom (7-DOF) flexible cable-driven robot in 2018. The part below the shoulder joint is only 2.63 kg, and all the driving motors are arranged at the shoulder joint, which greatly reduces the robot’s self-weight. Differently from the traditional rope-driven shaft rotation to achieve movement, the wrist joint uses the retracting and releasing of the rope to realize the working space of a half sphere in the space, which highly simulates the working range of the human wrist. Kim et al.¹⁷ introduced a novel bio-inspired cable-driven knee orthosis, with the bio-inspired rigid joint structure, which is kinematically identical to the human knee joint shape on the medial and lateral sides, and it is possible to prevent the occurrence of abnormal load due to the misalignment as well as reducing the load transmitted through the musculoskeletal system.

In the research field of cable-driven or flexible robotic arm control schemes, various innovative methods and technologies have emerged. Some studies focus on optimizing the motion trajectory planning of cable-driven robots to enhance their motion efficiency and accuracy. Peng et al.¹⁸ proposed a trajectory tracking framework for cable-driven robotic arms by combining dynamic feedforward control and proportional-derivative (PD) control, with active cable tension and end effector pose as optimization indicators. Xie et al.¹⁹ presented a robust synchronous control scheme in the cable length space to achieve high-precision trajectory tracking of cable-driven robotic arms. Li et al.²⁰ proposed a method based on fuzzy control to adjust the stiffness coefficient of the robotic arm to enhance the robustness of cable-driven robotic arms. Fareh et al.²¹ introduced an advanced robust disturbance rejection control for flexible link manipulators to track desired trajectories in joint space and minimize link vibrations.

With the continuous development of artificial intelligence technology, autonomous learning algorithms based on deep learning and reinforcement learning have been applied more and more widely. Among them, deep reinforcement learning (DRL)^22,23 provides a way to solve the trajectory planning problem of high-dimensional continuous state and motion space. In recent years, many researchers have used DRL algorithm to solve the problem of robot trajectory tracking control and path planning in complex environment.^24
–26 Zhao et al.²⁷ studied robot trajectory tracking control based on reinforcement learning and proposed a model based actor-critic learning, which effectively improved robot trajectory tracking accuracy and solved the problem of a long learning period of control strategy. In addition, Zhao et al.²⁸ also studied the robot impedance control based on reinforcement learning, which can effectively improve the stability of robot interaction with the environment. Zhong et al.²⁹ proposed a path planner for welding manipulator based on DRL, introducing an inverse kinematics module to provide prior knowledge for improving learning efficiency, while designing a gain module to avoid local optimal strategies. Wagaa et al.³⁰ developed different deep learning networks for solving the inverse kinematics problem of 6-DOF robotic manipulators. Zheng et al.³¹ solved the problem of low convergence of action selection strategy and reward function at two levels to solve the difficulty of optimal strategy in trajectory planning when DRL is applied, designed a dynamic action selection strategy, and proposed a combined reward function combining artificial potential field method and time–energy function. Sun et al.³² proposed a novel motion planning method based on DRL, called reconfigurable structure of deep deterministic policy gradient for mobile robots, which can adaptively change the network structure.

Based on the design concept of man–machine cooperation, high speed, and low inertia, this article designed a humanoid cable-driven hybrid robot (CDHR) driven by flexible cable with 6-DOFs to simulate the structure of human arm. Aiming at joint coupling phenomenon, decoupling method is proposed, and kinematic model is established to realize high-precision motion simulation. Due to the nonlinear coupling characteristics and the uncertainty of the dynamics parameters, combined with its hybrid characteristics, a controller is realized by combining DC-PID control and TD3 DRL, and the position tracking and orientation training simulation are carried out. Finally, the effectiveness of the decoupling method and the controller are verified by the trajectory tracking experiment.

Mechanical design and modeling

The novel humanoid CDHR designed in this article mimics the human arm, which is mainly composed of three parts: a cable–spool–pulley system, a series mechanism, and a coaxial spherical parallel mechanism (SPM). The overall structure is a humanoid arm with a spherical wrist, and the wrist has the full DOF of roll, pitch, and yaw. As shown in Figure 1 for three-dimensional (3D) model of the humanoid CDHR, the robot has the advantages of low self-weight, small moment of inertia, and good dynamics performance.

Figure 1.

A three-dimensional model of a humanoid CDHR. CDHR: cable-driven hybrid robot.

To improve the safety of human–robot interaction, the robot is cable-driven, which guides the ropes through the guide grooves and guide wheels inside the robot, and realizes the coordinated expansion and retraction of 12 cables by controlling six groups of cable-driven units. Each pair of flexible cables is connected through a slider on the ball screw to achieve bidirectional motion of the joints. The internal wiring of the robot is shown in Figures 2 and 3. Figure 2 shows the routing of the cables that drive the parallel mechanism of the robotic arm. Considering that it would be difficult to discern the position and function of each cable if all the cable routings were displayed in a single image, we divided the cable routings based on different driving joints into subfigures (a), (b), (c), and (d) and differentiated them using different colors. The red, yellow, and blue cables in Figure 2 control the rotation of the base joints of the corresponding red, green, and blue links shown on the right side of Figure 9.

Figure 2.

Overall wiring diagram of the parallel mechanism driving flexible cables.

Figure 3.

Overall wiring diagram of the series mechanism driving the flexible cable.

Figure 3 shows the routing of the cables driving the serial mechanism of the robotic arm. Similarly, we have divided the cable routings based on different driving joints into subfigures (a), (b), (c), and (d) and differentiated them using different colors. The green and pink cables in Figure 3(a) and (b) collectively control the rotation of the link labeled as ③ in Figure 3(a), and the blue and purple cables in Figure 3(c) and 2(d) collectively control the rotation of the link labeled as ② in Figure 3(a).

Simscape is a physical modeling toolbox integrated under Simulink, which can imitate real objects for modeling and simulation, and it is widely used in the simulation of physics, mechanical engineering, and other fields.³³ The humanoid CDHR designed in this article is physically modeled in Simscape. Through the coordinate transformation and motion relationship between various parts of the robot, the simulation model as shown in Figure 4 is finally constructed. It is worth noting that in the simulation model in Figure 4, to improve the simulation efficiency, the linear guide module underneath the base of the robotic arm on the right side of the 3D model in Figure 1 has been simplified and replaced with a cable reel, as shown in Figure 4. The winding and unwinding of the driving cable is controlled by controlling the rotation of the cable reel.

Figure 4.

Simscape system diagram.

The simulation model in Figure 4 is divided into two modules: the kinematics solving module and the manipulator simulation module. The kinematics solving module consists of three main components: trajectory planning, inverse kinematics solving, and decoupling. Meanwhile, the manipulator simulation module includes a cable–spool–pulley module and an environment configuration module. In Figure 4, t represents the current simulation running time, while x, y, and z represent the end position of the manipulator, and $r x$ , $r y$ , and $r z$ represent its end posture. To begin with, we input time variable t into the trajectory planning module which calculates desired positions (x, y, z) as well as postures ( $r x$ , $r y$ , $r z$ ) for the manipulator’s end based on a preset trajectory. These desired values are passed to the inverse kinematics solving module where it computes and outputs desired joint angles (d ₁,d ₂,d ₃,d ₄,d ₅, and d ₆) using inverse kinematics equations specific to this manipulator. Next step involves feeding these six desired joint angles into the decoupling module which aims at reducing coupling effects between joints. Through calculations performed by this decoupling mechanism, new set of decoupled desired joint angles (d ₁,d ₂,d ₃,d ₄,d ₅, and d ₆) are obtained as output. Finally, these newly calculated joint angles (d ₁-d ₆) are fed into the manipulator simulation module where they drive ropes through a pulley group system to control each joint accordingly. This allows for achieving corresponding angles to simulate realistic movements.

The 3-DOF coaxial SPM^34,35 is used as the wrist joint of the robot. The paragraph describes the kinematic model of a 3-DOF coaxial spherical parallel wrist at the end of a robotic arm, as shown in Figure 5. The kinematic chain 1, kinematic chain 2, and kinematic chain 3 in Figure 5 correspond to the red, green, and blue link configurations in Figure 9. The moving platform in Figure 5 corresponds to the platform numbered 3 in Figure 8. Each of the three kinematic chains in Figure 5 is mainly composed of a base, a proximal end, a distal end, and a platform, and the base joints of the three chains are coaxial. Because the arc-shaped connecting cable has the property of bending in different planes, the structure ensures that the beams of each rotating pair on the three branch chains intersect at the same point (the center of the sphere), and the movement area of the platform will be a sphere. This configuration can effectively control the motion orientation of the moving platform, with strong bearing performance, high control accuracy, fast feedback speed, compact structure, and no negative motion pairs. Physical modeling function of cable is also provided in Mutibody, and parts of cable are constructed using Belt-cable spool, Pulley, and Belt-cable properties modules. As shown in Figure 6, taking the cable-driven unit of the forearm joint as an example, parameters such as the initial angle and winding radius of the flexible cable are set through the Belt-cable core (No.① in Figure 6) and the Pulley module (No.② in Figure 6). The Belt-cable property module (No.③ in Figure 6) is used to modify the inherent properties of the cable.

Figure 5.

Three rotary joints (3-RRR) coaxial spherical parallel mechanism.

Figure 6.

Cable-driven unit of forearm joint.

Kinematics analysis and simulation

The kinematic analysis of the humanoid CDHR is carried out. The schematic diagram of the robot mechanism is shown in Figure 7. In the figure, joint 1 is the waist rotary joint, joint 2 is the big arm pitching joint, joint 3 is the forearm pitching joint, and joints 4, 5, and 6 are the coaxial parallel joints. l_i (i = 1, 2, 3) is the rod length of the mechanical arm. The movements of each joint are described mainly in terms of the base coordinate system U and the end-effector coordinate system T . According to the robot configuration, the position and orientation of the end-effector of the robot are determined by the series mechanism and the coaxial SPM, respectively. Therefore, the kinematic model of the two parts is established respectively, and the overall kinematic model is obtained.

Figure 7.

Schematic diagram of robot mechanism.

Kinematics analysis of 3-DOF series mechanism

The kinematic model of 3-DOF series mechanism is established by using D-H parameter method. Establish the coordinate system for the robotic arm as depicted in Figure 8, and utilize this established coordinate system Establish the coordinate system for the robotic arm as depicted in Figure 8, and utilize this established coordinate system to derive the DH parameter table shown in Table 1. In the table, d ₁, a ₂, and a ₃ correspond to the lengths of l ₁, l ₂, and l ₃ in Figure 7, respectively.

Figure 8.

D-H coordinate system of series mechanism.

Table 1.

D-H parameters of tandem mechanism.

i	$θ_{i} (^{'})$	$d (mm)$	$a (mm)$	$α (^{'})$
1	$θ_{1}$	l ₁	0	90
2	$θ_{2}$	0	l ₂	0
3	$θ_{3}$	0	l ₃	0

Through D-H coordinate transformation, the calculation formula of transformation matrix of 3-DOF series mechanism can be written out

A = Rot (z, θ) \cdot Trans (0, 0, d) \cdot Trans (a, 0, 0) \cdot Rot (x, α)

Then the forward kinematic transformation matrix is

T = [\begin{matrix} c_{1} c_{23} & - c_{1} s_{23} & s_{1} & c_{1} (a_{2} c_{2} + a_{3} c_{23}) \\ s_{1} c_{23} & - s_{1} s_{23} & - c_{1} & s_{1} (a_{2} c_{2} + a_{3} c_{23}) \\ s_{23} & c_{23} & 0 & a_{2} s_{2} + a_{3} s_{23} \\ 0 & 0 & 0 & 1 \end{matrix}]

where $s_{i} = sin (θ_{i})$ , $c_{i} = cos (θ_{i})$ .

In engineering application, inverse kinematics is the key step to realize the robot motion control. The commonly used methods include numerical method, analytical method, and geometric method. In this article, the analytical method is used to solve the angle value of each joint of the robot.

It should be noted that in the process of solving the inverse kinematics of the robotic arm, the three joint angles $θ_{1}$ , $θ_{2}$ , and $θ_{3}$ of the robotic arm in serial structure are all unknown variables. Therefore, the values of $θ_{1}$ , $θ_{2}$ , and $θ_{3}$ cannot be solved through the transformation matrix T at this stage.

From the fourth column of equation (2), we can know the position of the end-effector in space

{\begin{matrix} p_{x} = c_{1} (a_{2} c_{2} + a_{3} c_{23}) \\ p_{y} = s_{1} (a_{2} c_{2} + a_{3} c_{23}) \\ p_{z} = a_{2} s_{2} + a_{3} s_{23} \end{matrix}

Firstly, the angle of joint 3 is solved

p_{x}^{2} + p_{y}^{2} + p_{z}^{2} = a_{2}^{2} + a_{3}^{2} + 2 a_{2} a_{3} c_{3}

It is easy to obtain from the above formula

c_{3} = \frac{p_{x}^{2} + p_{y}^{2} + p_{z}^{2} - a_{2}^{2} + a_{3}^{2}}{2 a_{2} a_{3}}

s_{3} = \pm \sqrt{1 - c_{3}^{2}}

thus

θ_{3} = atan 2 (s_{3}, c_{3})

According to the sign of s₃, two solutions are given

{\begin{matrix} θ_{3, 1} \in [- π, π] \\ θ_{3, 2} = - θ_{3, 1} \end{matrix}

Then find the angle of joint 2. In equation (4), s ₁, c ₁ is eliminated from $p_{x}^{2} + p_{y}^{2} = {(a_{2} c_{2} + a_{3} c_{23})}^{2}$ , and $p_{w z}$ is combined to obtain

c_{2} = \frac{\pm \sqrt{p_{x}^{2} + p_{y}^{2}} (a_{2} + a_{3} c_{3}) + p_{z} a_{3} s_{3}}{a_{2}^{2} + a_{3}^{2} + 2 a_{2} a_{3} c_{3}}

s_{2} = \frac{p_{z} (a_{2} + a_{3} c_{3}) \mp \sqrt{p_{x}^{2} + p_{y}^{2}} a_{3} s_{3}}{a_{2}^{2} + a_{3}^{2} + 2 a_{2} a_{3} c_{3}}

According to equations (3) to (8), we can know that the angle of joint 3 has two solutions, and the angle of joint 2 has four values of four-quadrant arctangent according to equations (9) and (10). Therefore, theoretically, there are eight combinations of the forms of solutions, but excluding the four groups of solutions that do not conform to ${cos}^{2} θ_{2} + {sin}^{2} θ_{2} = 1$ , there are only four other groups of solutions. In this way, multiple groups of solutions of joint 2 can be obtained, whose solutions are shown as follows

{\begin{matrix} \begin{array}{l} θ_{2,1} =atan2((α_{2} + α_{3} c_{3} {)p}_{z} - α_{3} s_{3}^{+} \sqrt{p_{x}^{2} + p_{y}^{2}}, \\ (α_{2} + α_{3} c_{3}) \sqrt{p_{x}^{2} + p_{y}^{2}} + α_{3} s_{3}^{+} p_{z}) \end{array} \\ \begin{array}{l} θ_{2,2} =atan2((α_{2} + α_{3} c_{3} {)p}_{z} + α_{3} s_{3}^{+} \sqrt{p_{x}^{2} + p_{y}^{2}}, \\ - (α_{2} {+a}_{3} c_{3}) \sqrt{p_{x}^{2} + p_{y}^{2}} + α_{3} s_{3}^{+} p_{z}) \end{array} \\ \begin{array}{l} θ_{2,3} {=atan2((a}_{2} {+a}_{3} c_{3} {)p}_{z} - α_{3} s_{3}^{-} \sqrt{p_{x}^{2} + p_{y}^{2}}, \\ (α_{2} + α_{3} c_{3}) \sqrt{p_{x}^{2} + p_{y}^{2}} + α_{3} s_{3}^{-} p_{z}) \end{array} \\ \begin{array}{l} θ_{2,4} =atan2((α_{2} + α_{3} c_{3} {)p}_{z} + α_{3} s_{3}^{-} \sqrt{p_{x}^{2} + p_{y}^{2}}, \\ - (α_{2} + α_{3} c_{3}) \sqrt{p_{x}^{2} + p_{y}^{2}} + α_{3} s_{3}^{-} p_{z}) \end{array} \end{matrix}

Finally, the angle of joint 1 is solved, which is obtained from equation (3)

{\begin{matrix} θ_{1,1} = {atan2(p}_{y} {,p}_{x}) \\ θ_{1,2} = atan2(- p_{y}, - p_{x}) \end{matrix}

Permuting the joint angles, according to equations (8), (11), and (12), it can be seen that there are four groups of inverse solutions for the 3-DOF series mechanism

{\begin{matrix} (θ_{1, 1}, θ_{2, 1}, θ_{3, 1}) \\ (θ_{1, 1}, θ_{2, 3}, θ_{3, 2}) \\ (θ_{1, 2}, θ_{2, 2}, θ_{3, 1}) \\ (θ_{1, 2}, θ_{2, 4}, θ_{3, 2}) \end{matrix}

Kinematics analysis of 3-DOF coaxial SPM

As shown in Figure 9, No.① represents the proximal link, No.② represents the distal link, and No.③ represents the mobile platform. Different from the general spherical three rotary joints (3-RRR) parallel mechanism,^36,37 the rotation axis of the proximal member of the 3-DOF SPM designed in this article is defined as the vertical direction ( $γ$ = 0), the pyramid located below disappears and becomes the SPM with the same axis. The three equidistant moving chains of the mechanism are woven into i = 1, 2, 3 in the counterclockwise direction. Each moving chain is divided into a base, a near-end connecting rod, a far-end connecting rod, and a moving platform. The curvature of the near-end connecting rod is defined as $α_{1}$ and that of the far-end connecting rod as $α_{2}$ . Each kinematic chain has three joints, and the axes of all joints intersect the center of rotation, which is represented by O. The axes of the three joints are defined as unit vectors. The axis lines of the three joints are defined as unit vectors ${\vec{u}}_{i}$ , ${\vec{v}}_{i}$ , ${\vec{w}}_{i}$ (i = 1, 2, 3), where the angle between ${\vec{u}}_{i}$ and ${\vec{w}}_{i}$ is defined as $α_{1}$ and the angle between ${\vec{w}}_{i}$ and ${\vec{v}}_{i}$ is defined as $α_{2}$ . The angle between adjacent ${\vec{v}}_{i}$ is defined as $α_{3}$ . Establish the coordinate system in the initial state of the mechanism. A coordinate system is established under the initial state of the mechanism. The geometric origin of the moving platform is taken as the origin of the coordinate system, the vertical direction is z-axis, the axis direction of the middle rotating axis of the first moving chain is x-axis, and the direction of y-axis is determined by the right hand rule.

Figure 9.

Diagram of parallel mechanism.

From the above definition, we can write the unit vector u_i for the first joint of the three motion chains

u_{i} = [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}] (i = 1, 2, 3)

The expression of unit vector w_i of the intermediate joint is

ω_{i} = [\begin{matrix} cos (η_{i} - θ_{i}) sin α_{1} \\ sin (η_{i} - θ_{i}) sin α_{1} \\ - cos α_{1} \end{matrix}]

where $η_{i} = 2 π (i - 1) / 3$ , i = 1, 2, 3.

The unit vector v_i of the third joint represents the direction of the moving platform, and there are constraints with the other two unit vectors. According to the geometric constraints, the following equation can be obtained

{\begin{matrix} ω_{i} \cdot υ_{i} = cos α_{2} (i = 1, 2, 3) \\ υ_{i} \cdot υ_{j} = cos α_{3} (i, j = 1, 2, 3. i \neq j) \\ ‖ υ_{j} ‖ = 1 \end{matrix}

where $α_{3} = 2 {sin}^{- 1} (sin β cos \frac{π}{6})$ .

Equation (16) gives nine equations, which are solved by numerical method and x ₀ is defined as shown in equation (19). The SPM designed in this article is l-l-l assembly mode, all the distal branch links are located on the left side of their central plane of symmetry, and v_i always points to the positive direction of the branch chain rotation motion. So the initial vector x ₀ is an instance of the unit vector w_i rotating in the positive direction in the same direction as the z-axis. Assuming that the unit vector w_i rotates $10 °$ about the z-axis, that is, λ = $10 °$ , the formula can be obtained according to the Rodriguez rotation formula

ω_{i, rot} = ω_{i} cos λ + (ω_{i} \times \vec{k}) sin λ + \vec{k} \cdot (\vec{k} \cdot ω_{i}) (1 - cos λ)

where $\vec{k}$ represents the unit vector in the same direction as the z-axis.

Substituting the value of $\vec{k}$ into the equation, we have

ω_{i, rot} = ω_{i} cos λ + (ω_{i} \times [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}]) sin λ + [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}] \cdot ([\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}] \cdot ω_{i}) (1 - cos λ) (i = 1, 2, 3)

where $\vec{k}$ = $[\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}]$ .

The initial value is obtained as

x_{0} = {[\begin{matrix} ω_{1 x, rot} & ω_{1 y, rot} & ω_{1 z, rot} \\ ω_{2 x, rot} & ω_{2 y, rot} & ω_{2 z, rot} \\ ω_{3 x, rot} & ω_{3 y, rot} & ω_{3 z, rot} \end{matrix}]}^{T}

According to the above equation, the direction vector v_i of the moving platform can be obtained

{\begin{matrix} υ_{1} = {[ω_{1 x, rot}, ω_{1 y, rot}, ω_{1 z, rot}]}^{T} \\ υ_{2} = {[ω_{2 x, rot}, ω_{2 y, rot}, ω_{2 z, rot}]}^{T} \\ υ_{3} = {[ω_{3 x, rot}, ω_{3 y, rot}, ω_{3 z, rot}]}^{T} \end{matrix}

To facilitate the observation of orientation changes of the moving platform, the normal vector n of the moving platform is used to represent

{\begin{matrix} n = \frac{υ_{1} + υ_{2} + υ_{3}}{‖ υ_{1} + υ_{2} + υ_{3} ‖}, \forall β \neq 90 ° \\ n = \frac{υ_{1} \times υ_{2}}{‖ υ_{1} + υ_{2} ‖}, \forall β = 90 ° \end{matrix}

Thus, the unique solution of the corresponding forward kinematics problem in l-l-l assembly mode can be obtained. Its specific solution process is shown in the algorithm below.

The inverse kinematics of the coaxial SPM is solved below. Given the direction v_i , i = 1, 2, 3 of the moving platform, the angle $θ_{i}$ ,i = 1, 2, 3 of the underlying platform is solved.

According to the first equation of equation (16), the following expression can be obtained

A_{i} T_{i}^{2} + 2 B_{i} T_{i} + C_{i} = 0 (i = 1, 2, 3)

Among them

{\begin{matrix} T_{i} = tan (\frac{θ_{i}}{2}), i = 1, 2, 3 \\ A_{i} = - cos α_{2} - sin α_{1} sin η_{i} υ_{i y} - sin α_{1} cos η_{i} υ_{i x} - cos α_{1} υ_{i z} \\ B_{i} = sin α_{1} sin η_{i} υ_{i x} - sin α_{1} cos η_{i} υ_{i y} \\ C_{i} = - cos α_{2} +sin α_{1} cos η_{i} υ_{i x} +sin α_{1} sin η_{i} υ_{i y} - cos α_{1} υ_{i z} \end{matrix}

Algorithm 1.

Unique solution of forward kinematics of coaxial SPM.

Input:

α_{1}

α_{2}

β

,x ₀,

λ

θ_{i}

,i=1,2,3

Output: Moving platform normal vector n.

1 Calculate

η_{i} = 2 π (i - 1) / 3

,i=1,2,3;

2 Calculate

α_{3} = 2 {sin}^{- 1} (sin β cos \frac{π}{6})

;

3 for i

\leftarrow

1 to 3 do

4 Calculate u_i

5 Calculate w_i using equation (15) given

θ_{i}

;

6 Calculate

w_{i, rot}

using equation (19) given

λ

;

7 end for

8 Calculate initial guess vector x ₀ using equation (20);

9 Calculate v_i ,i=1,2,3, by solving the system of equation (16) numerically, givenw_i ,i=1, 2, 3, with x ₀;

10 Calculate moving platform normal vector nusing equation (22) given v_i ,i=1,2,3

11 return n

According to equations (23) and (24), the angle $θ_{i}$ (i = 1, 2, 3) of the lower platform corresponding to the target orientation can be solved. Eight groups of coaxial SPM kinematic solutions can be obtained from the above equations. These solutions correspond to eight different coaxial SPM orientations and define various assembly modes of the parallel mechanism in various initial states. The pose of the 3D model after input $θ$ = [ $30 °$ , $45 °$ , $60 °$ ] is shown in Figure 10. In this article, l-l-l orientation is selected as the assembly mode of coaxial SPM model and prototype. Through the forward kinematics numerical verification and 3D model pose verification, the unique solution of the inverse kinematics of the parallel mechanism can be obtained. The solution process is shown in the algorithm:

Figure 10.

Orientations of eight groups inverse kinematics solutions of SPM. SPM: spherical parallel mechanism.

Algorithm 2.

Unique solution of inverse kinematics of coaxial SPM.

Input:

α_{1}

α_{2}

,v_i ,i=1,2,3.

Output: Input joint positions vector

θ_{i}

,i=1,2,3.

1 Calculate

η_{i} = 2 π (i - 1) / 3

,i=1,2,3;

2 for i

\leftarrow

1 to 3 do

3 Calculate A_i ,B_i ,C_i using equation (24) given v_i and

η_{i}

;

4 Solve T_i using equation (23);

5 Solve

θ_{i}

using equation (24);

6 end for

7 The l-l-l orientation is selected as the assembly mode of the coaxial SPM. Through the forward kinematics numerical verification and 3D model pose verification, the unique solution of the inverse kinematics of the parallel mechanism is obtained;

8 return

θ_{i}

,i=1,2,3.

Kinematic coupling analysis of joints

The cable-driven joint robot uses the flexible cable as the driving medium. On the one hand, the load–deadweight ratio of the robot is greatly improved, on the other hand, the power consumption of the robot is reduced, and the safety of human–computer interaction is improved due to the flexibility of cable itself. However, when the cable is used to drive the robot joints, the cable-driven unit is generally placed at the base position, and the winding of the cable inside robot often leads to the motion coupling between multiple joints, which affects the motion accuracy of the robot arm. Combined with the structural characteristics of the humanoid CDHR designed in this article, the coupling phenomenon is analyzed, and the decoupling method is proposed.

Pitch joint coupling analysis

Figure 11 is a schematic diagram of motion coupling between the forearm pitching and the big arm pitching of the mechanical arm. When the last joint i rotates at an angle ${θ^{'}}_{i}$ , the rope wrapping angle on joint i increases by $θ^{'}$ . If the length of the rope between the two joints is unchanged, the next joint i + 1 will deflect $β_{i + 1}$ .Therefore, angle $β_{i + 1}$ should be compensated by changing the length of the rope between the two joints. Deflection angle $β_{i + 1}$ is determined by the radius of the intermediate spindle of rotation and joint i + 1. The specific relationship is shown as follows

R_{Z}^{i} \cdot θ_{i} = R_{O}^{i + 1} \cdot β_{i + 1}

Figure 11.

Pitch joint coupling diagram.

where $R_{Z}^{i}$ is the radius of the intermediate spindle of rotation, and $R^{i + 1}$ is the radius of the latter joint i + 1. Therefore, coupling angle $β_{i + 1}$ can be written as

β_{i + 1} = \frac{R_{Z}^{i} \cdot θ_{i}^{'}}{R_{O}^{i + 1}}

After the coupling angle is obtained from the above equation, the angle compensated by the cable-driven unit driving joint i + 1 can be written

γ_{i + 1} = k \frac{R_{O}^{i + 1}}{R_{o}^{i + 1}} β_{i + 1} = \frac{k R_{Z}^{i} \cdot θ_{i}^{'}}{R_{o}^{i + 1}}

Considering the decoupling of joints, the motor that drives the cable has the distinction between forward and reverse rotation, so a coefficient k is defined to identify the direction in which the motor should rotate during decoupling. Where k = ±1, the positive or negative sign of k depends on the way of routing, positive if parallel wiring, negative if cross wiring.

$R^{i + 1}$ is the radius of the cable-driven unit that drives joint i + 1.

When equation (27) is applied to the robot studied in this article, the compensation angle of the flexible cable drive unit of each joint can be obtained

[\begin{matrix} γ_{2} \\ γ_{3} \\ γ_{4} \\ γ_{5} \\ γ_{6} \end{matrix}] = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ \frac{R_{Z}^{2}}{R_{o}^{3}} & 0 & 0 & 0 & 0 \\ \frac{R_{Z}^{2}}{R_{o}^{3}} & \frac{R_{Z}^{3}}{R_{o}^{4}} & 0 & 0 & 0 \\ \frac{R_{Z}^{2}}{R_{o}^{3}} & \frac{R_{Z}^{3}}{R_{o}^{4}} & 0 & 0 & 0 \\ - \frac{R_{Z}^{2}}{R_{o}^{3}} & - \frac{R_{Z}^{3}}{R_{o}^{4}} & 0 & 0 & 0 \end{matrix}] [\begin{matrix} θ_{2} \\ θ_{3} \\ θ_{4} \\ θ_{5} \\ θ_{6} \end{matrix}]

Rotary joint coupling analysis

As shown in Figure 12, when the waist joint rotates by $θ^{'}$ , the rope length changes from l_i to L_i , changing by $Δ l_{i}$ . The key to decoupling is to compensate the rope length change by $Δ l_{i}$ , so that it does not affect the rear joint.

Figure 12.

Pitch joint coupling diagram.

The change in rope length $Δ l_{i}$ can be obtained by the following formula

Δ l_{i} = | L_{i} - l_{i} |

The formula for calculating L_i and l_i are as follows

L_{i} = \sqrt{R_{i 1}^{2} + R_{i 2}^{2} - 2 R_{i 1} R_{i 2} cos θ' + H_{i}^{2}}

l_{i} = \sqrt{{(R_{i 1} - R_{i 2})}^{2} + H_{i}^{2}}

where $R_{i 1}$ and $R_{i 2}$ are the radii of the upper and lower circular planes, and R_i is the vertical distance between the two circular planes.

Kinematic simulation

Verification of decoupling method

Aiming at the coupling phenomenon of joint motion, to verify effectiveness of the decoupling method, this article uses the established Mutibody model for forward kinematics simulation and takes the first three joints as an example to make the first joint, the second joint, and the third joint move separately. Equation (32) is used to conduct trajectory planning for the joint space with quintic polynomial interpolation, and the changes of joint angles are shown in the Table 2. Figure 13 shows the coupling effect of three joints moving alone on the other joints. When the first three joints are rotated individually, the maximum coupling angle is $2.78 °$ , $23.40 °$ , and $18.00 °$ , respectively.

θ (t) = θ_{0} + \frac{10}{t_{f}^{3}} (θ_{f} - θ_{0}) t^{3} - \frac{15}{t_{f}^{4}} (θ_{f} - θ_{0}) t^{4} + \frac{6}{t_{f}^{5}} (θ_{f} - θ_{0}) t^{5}

Table 2.

Robot joint angle information.

$t (s)$	4	8	12
Joint 1 ( $°$ )	10	20	30
Joint 2 ( $°$ )	10	20	30
Joint 3 ( $°$ )	10	20	30

Figure 13.

Coupling effects when the first three joints move alone: (a) joint 1 moves alone, (b) joint 2 moves alone, and (c) joint 3 moves alone.

The start angle of the joint at two adjacent points is $θ_{0}$ , the end angle of the joint is $θ_{f}$ , the start time is t ₀, and the end time is t_f .

The decoupling method is added to the joint movement, and the first three joints are driven at the same time to obtain the motion of each joint, as shown in Figure 14(a), and the change in the length of the flexible cable is shown in Figure 14(b).

Figure 14.

Decoupling verification: (a) change of joint angle after decoupling, and (b) change of cable length after decoupling.

It can be concluded that after adding the decoupling method, the movement of the six joints meets the expectation, and the coupling between the first three joints is nearly eliminated. However, there are still small errors in the last three joints, with a maximum error of $0.19 °$ . This is the accumulation of coupling errors caused by motions of the first three joints and is caused by the failure to consider the change of tangency point of the cable on the pulley. But the error is small, which shows that the decoupling method is feasible and effective.

Kinematic verification

The Mutibody model is used to verify the kinematic simulation. Combined with the decoupling method and considering the coupling relation, the joint angle is converted into the angle of cable-driven unit, and the motion control of the humanoid CDHR is realized through the flexible cable transmitting motion. Firstly, the trajectory planning is carried out for the end-effector position. The trajectory equation is shown in the equations (31) and (32). The position control of the end-effector is realized by adding the inverse kinematics of the decoupling method. Secondly, $θ_{i}$ = [ $60 °$ , $90 °$ , $120 °$ ] is input to the coaxial SPM, and quintic polynomial interpolation equation (32) is used to verify the change of end-effector orientation

{\begin{cases} x = 6 t^{2} - 0.4 t^{3} \\ y = 0 t < 10 \\ z = l_{1} + l_{2} - 1.5 t^{2} + 0.1 t^{3} \end{cases}

{\begin{cases} x = 200 + 6.25 \times 10^{- 3} \times (t - 10)^{3} \times sin (2 (t - 10)) \\ y = 6.25 \times 10^{- 3} \times (t - 10)^{3} \times cos (2 (t - 10)) t \geq 10 \\ z = 505.5 - 6.25 \times 10^{- 3} \times (t - 10)^{3} \end{cases}

Figure 15 shows cable length variation, trajectory tracking and error, orientation change and error, respectively. Figure 15(b) and 15(c) verifies that the end-effector position changes meet expectations. According to Figure 15(c), the error of actual trajectory and the expected trajectory of the simulation is within 0.25 mm, and the maximum error of X-axis is 0.22 mm, Y-axis is 0.19 mm, and Z-axis is 0.01 mm. Figures 15(d) to (f) verify that the end-effector orientation changes meet expectations. It should be noted that, due to the use of proportional-integral-derivative (PID) control at this time, the tracking error of the robotic arm’s motion trajectory exhibits a gradual increase at the beginning, followed by repeated oscillations within a certain range during the motion process. The error of actual rpy and the expected rpy of the simulation is within 0.0015 rad. The reason for error is that the coordinate transformation of Mutibody modeling is incorrect and the coupling compensation of the rotary joint does not consider the change of the tangential point between the cable-driven unit and the pulley.

Figure 15.

Kinematic verification: (a) diagram of cable length variation, (b) trajectory tracking diagram, (c) trajectory error diagram, (d) orientation comparison chart, (e) diagram of orientation change, and (f) orientation error diagram.

From the simulation results, it can be seen that the end-effector pose meets the expectations with small errors, which verifies the effectiveness and correctness of the decoupling method and kinematic algorithm.

Control method design

For the humanoid CDHR studied in this article, considering its complex nonlinearity, coupling characteristics, uncertainty of dynamics parameters, and the stress of the flexible cable should be considered, this article combines the hybrid characteristics of the robot with DC-PID control and TD3 DRL algorithm for dynamics control, so as to achieve trajectory tracking and orientation control.

The complete control flow of CHDR is shown in Figure 16. Based on the predefined desired trajectory, the desired position X_desired, Y_desired, and Z_desired of the serial part end effector of the robotic arm at the current time is obtained. Then, the desired joint angles are calculated through inverse kinematics and joint decoupling, and input into the PID controller to obtain the joint torques $τ_{1}$ , $τ_{2}$ , and $τ_{3}$ , which are then input into the simulation model. Due to the phenomenon of joint coupling, the parallel mechanism’s end effector will also rotate. At this time, the velocities ${\vec{v}}_{1}$ , ${\vec{v}}_{2}$ , and ${\vec{v}}_{3}$ of the parallel mechanism’s end effector are taken as observations and rewards and input into the TD3 model. TD3 provides the rotational torques $τ_{4}$ , $τ_{5}$ , and $τ_{6}$ for the base joints of the parallel mechanism, which are then input into the simulation model.

Figure 16.

Control system block diagram.

PID control based on decoupling method

Combined with the characteristics of the humanoid CDHR studied in this article, the DC-PID control method is adopted to realize the position control of the end-effector. PID control algorithm is simple, robust, and reliable. Classical PID control algorithm can be discretely expressed as

u (k) = K_{p} e (t) + K_{D} (e (k) - e (k - 1)) + K_{I} \sum_{n = 0}^{k} e (n)

Based on the angular dynamics model to realize the position control of the humanoid CDHR, it is first necessary to carry out the inverse kinematic solution to transform the Cartesian space into the Joint space to obtain the desired trajectories $θ_{d} (t)$ of the series joints. However, because of the presence of flexible cable and coupling phenomenon, the robot transmits forces and moments through the flexible cable, so the desired trajectory $q_{d} (t)$ of the cable-driven unit with the coupling relationship needs to be obtained through the radius relationship and decoupling method. And when carrying out the control, it is also necessary to carry out the primary and secondary derivation of the desired trajectory of each cable-driven unit, the derivation process is too complicated, and for simplicity, the following third-order integral chain differentiator is used to realize ${\dot{q}}_{d} (t)$ and ${\ddot{q}}_{d} (t)$

{\begin{cases} {\dot{x}}_{1} = x_{2} \\ {\dot{x}}_{2} = x_{3} \\ {\dot{x}}_{3} = - \frac{k_{1}}{ε^{3}} (x_{1} - q_{d}) - \frac{k_{2}}{ε^{2}} x_{2} - \frac{k_{3}}{ε^{}} x_{3} \end{cases}

where the output x ₁ and x ₂ of the differentiator are ${\dot{q}}_{d} (t)$ and ${\ddot{q}}_{d} (t)$ . To suppress the peak phenomenon in the differentiator, at the initial time 0 ≤ t ≤ 1.0, take

ε = \frac{1}{100} (1 - e^{- 2 t})

So the following control law is obtained by combining the classical PID control

u (k) = K_{P} e (t) + K_{D} \dot{e} (t) + K_{I} \int e (t)

where $e (t) = q_{d} (t) - q (t)$ , $\dot{e} (t) = {\dot{q}}_{d} (t) - \dot{q} (t)$ , $\int e (t) = \int q_{d} (t) - \int q (t)$ , $q_{d} (t)$ is the desired trajectory of the cable-driven unit with the coupling relationship, $q_{d} (t)$ is the actual trajectory of the cable drive unit obtained by observing the actual trajectory of the joint and based on the radius relationship.

Using the above equation, the actual angles and angular velocities of the three series joints are observed as feedback variables, and the actual angles and angular velocities of the cable-driven unit are obtained from the radius relationship. The desired angles and angular velocities of the cable-driven unit are obtained by inverse kinematics, decoupling method, and third-order integral chain differentiator. The differences are substituted into the control law, and the errors are reduced by continuous iteration, so that the end-effector position finally meets the expectation.

The complete control system block diagram is shown in Figure 16. First, based on the desired trajectory, the desired position X_desired, Y_desired, and Z_desired of the robotic arm at the current time t is obtained. Then, the desired angles $γ_{1}$ , $γ_{2}$ , $γ_{3}$ , $γ_{4}$ , $γ_{5}$ , and $γ_{6}$ of the six joints are calculated based on the inverse kinematics. The decoupled joint angles $θ_{1}$ , $θ_{2}$ , $θ_{3}$ , $θ_{4}$ , $θ_{5}$ , and $θ_{6}$ are obtained by joint decoupling calculation. The decoupled joint angles $q_{d} (t)$ are input into the PID controller, which outputs the torque $u (k)$ for each joint and drives the simulation model to interact with the environment. Meanwhile, the actual joint angles $q (t)$ of the robotic arm in the environment are fed back to the PID controller.

TD3 deep reinforcement learning algorithm

The double delay deep deterministic policy gradient (TD3)^38
–40 reinforcement learning algorithm is an actor-critic based online heterogeneous DRL algorithm for solving the problem of continuous control of an intelligent agent outputting continuous actions. In DRL, for the problem that the algorithm falls into suboptimal strategy due to overestimation of Q value and the model is difficult to converge, TD3 algorithm combines DDPG and Double Q-learning, adopts two sets of networks to estimate Q value, selects relatively small as the updating target, and adopts conservative method to avoid over-high estimation of Q value. It also adds the skill of smoothing the target strategy. When calculating the target value, the noise disturbance is added to the action of the next state, which makes the value evaluation more accurate.

In addition, the TD3 algorithm adopts a delayed update strategy, that is, the critic network is updated many times and then the actor network is updated, which weakens the influence of overestimate deviation and enhances the stability.

Figure 17 describes the network structure of the TD3 algorithm, where s _t , a _t , and r _t , respectively, represent the state, action, and reward at time t. $θ^{μ}$ , $θ_{1}^{Q}$ , and $θ_{2}^{Q}$ represent the neural network parameters of online actuator network, online evaluator network 1, and online evaluator network 2, respectively. $θ^{μ^{'}}$ , $θ_{1}^{Q^{'}}$ , and $θ_{2}^{Q^{'}}$ represent the neural network parameters of target actuator network, target evaluator network 1, and target evaluator network 2, respectively. $ε$ is noise disturbance, which obeys normal distribution.

Figure 17.

TD3 network structure.

TD3 has two critic networks for fitting Q function (value function), avoiding overestimation by selecting a small Q value to enhance stability. Each critic network contains two copies of neural networks, namely, online and target. Actor network is used to fit deterministic strategy gradient function, which also contains two neural network copies online and target.

In the update process of TD3 algorithm, it adopts two critic networks and chooses a small Q value when calculating the sequential difference target value y_t . In addition, for the purpose of smoothing the target strategy, small noise $ε$ obeying normal distribution is added to the action of strategy selection

{\begin{cases} y_{t} = r_{t} + γ \cdot min (Q_{1} (s_{t + 1}, a_{t + 1}), Q_{2} (s_{t + 1}, a_{t + 1})) \\ \tilde{ε} c l i p (N (0, σ), - c, c) \end{cases}

where $γ$ is the discount factor, noise $ε$ is random noise intercepted from a normal distribution with a mean of 0 and a standard deviation of σ in the range of (−c, c).

Gradient descent is performed on the critic network according to the loss function, and the neural network parameters of online evaluator network 1 and online evaluator network 2 are updated

{\begin{cases} \nabla_{θ_{1}^{Q}} L = \nabla_{θ_{1}^{Q}} N^{- 1} \sum_{}^{} {(y_{t} - Q_{1} (s_{t}, a_{t}))}^{2} \\ \nabla_{θ_{2}^{Q}} L = \nabla_{θ_{2}^{Q}} N^{- 1} \sum_{}^{} {(y_{t} - Q_{2} (s_{t}, a_{t}))}^{2} \end{cases}

Then $- Q_{1} (s_{t}, a_{t})$ is used as a loss function for backpropagation to update parameter $θ^{μ}$ of the online actuator network and obtain the optimal strategy.

Finally, during the training, soft update is adopted to update the target actuator network and target evaluator network

{\begin{cases} θ_{1}^{Q^{'}} \leftarrow τ θ_{1}^{Q} + (1 - τ) θ_{1}^{Q^{'}} \\ θ_{2}^{Q^{'}} \leftarrow τ θ_{2}^{Q} + (1 - τ) θ_{2}^{Q^{'}} \\ θ_{1}^{μ} \leftarrow τ θ_{1}^{μ} + (1 - τ) θ_{1}^{μ} \end{cases}

where $τ$ is generally taken as 0.001.

State and action space design

In the process of humanoid CDHR movement, the pose of the end-effector relative to the base coordinate system will change due to joint coupling and end-effector position change. To achieve the dynamics control of end-effector orientation and obtain the desired orientation, the TD3 DRL algorithm is adopted to conduct orientation training for the coaxial SPM. The action space and state space are designed as follows:

The dimension of the action space is designed to be 3, and each action is defined as a torque signal (transmitted through the flexible cable) driving each joint of the coaxial SPM, and normalized (0–1).

The dimension of the design observation space is 15 and is defined as follows:

The angles of the three joints $θ_{4}$ , $θ_{5}$ , $θ_{6}$ , the angular velocities ${\dot{θ}}_{4}$ , ${\dot{θ}}_{5}$ , ${\dot{θ}}_{6}$ .

Yaw, pitch, and roll angles at the end-effector and yaw, pitch, and roll angular velocities.

Action value of the previous time step $a_{t - 1}$ .

Reward function and termination condition design

In this article, TD3 algorithm is used to train the end-effector to achieve the desired orientation. To make the model training reach convergence and stability faster, a reward function combining continuous reward and sparse reward is adopted to encourage the agent to learn the optimal strategy as soon as possible and achieve the desired orientation target. The reward function of the design agent is

R = r_{1} + r_{2} + r_{3} + r_{4} + r_{5} + r_{6} + r_{7} + r_{8}

where r ₁ is the continuous reward for whether the motion of the first joint of the coaxial SPM exceeds the allowable motion range. $θ_{4}$ < (− $π$ ) and $θ_{4}$ > $π$ are Boolean expressions, the joint angle $θ_{4}$ is judged to be 0 within the allowed range and 1 outside the range

{\begin{cases} r_{11} = | (θ_{4} - (- π)) * (θ_{4} < (- π)) | \\ r_{12} = | (θ_{4} - π) * (θ_{4} > π) | \end{cases}

r_{1} = λ_{1} (r_{11} + r_{12})

Similarly, the same is true for the second and third joints of the coaxial SPM.

{\begin{cases} r_{21} = | (θ_{5} - (- π)) * (θ_{5} < (- π)) | \\ r_{22} = | (θ_{5} - π) * (θ_{5} > π) | \end{cases}

r_{2} = λ_{1} (r_{21} + r_{22})

{\begin{cases} r_{31} = | (θ_{6} - (- π) * (θ_{6} < (- π)) | \\ r_{32} = | (θ_{6} - π) * (θ_{6} > π) | \end{cases}

r_{3} = λ_{1} (r_{31} + r_{32})

To achieve the pose target, a sparse reward designed by whether the orientation at end-effector of the robot reaches the target range in each episode is used. Divided into three ranges of reward values, a Boolean judgment is used, with a Boolean value of 1 for meeting the condition and 0 otherwise

r_{4} = {\begin{cases} λ_{2} (| r o l l | \leq (π / 36) \\ λ_{3} (| r o l l | \leq (π / 180) \\ λ_{4} (| r o l l | = = 0) \end{cases}

The same is true for the end pitch and yaw angles

r_{5} = {\begin{cases} λ_{2} (| p i t c h | \leq (π / 36) \\ λ_{3} (| p i t c h | \leq (π / 180) \\ λ_{4} (| p i t c h | = = 0) \end{cases}

r_{6} = {\begin{cases} λ_{2} (| y a w | \leq (π / 36) \\ λ_{3} (| y a w | \leq (π / 180) \\ λ_{4} (| y a w | = = 0) \end{cases}

Consider the cost of performing the action, providing a constant reward at each time step and subtracting the penalty for the action performed at the previous time step

r_{7} = \frac{λ_{5} T_{s}}{T_{f}}

where T_s and T_f are environmental sampling time and environmental final simulation time, respectively.

r_{8} = λ_{6} {\sum_{i} (u_{t - 1}^{i})}^{2}

To avoid the agent from over-exploring and wasting too much time in areas where the target is obviously not reached, a reasonable interval should be set for restriction. The yaw, pitch, and roll angles of the end-effector are judged by Boolean, and once the interval is exceeded, Isdone is set to 1 to end this episode and go to the next episode

Isdone = {\begin{cases} | roll | \geq (π / 3) \\ | pitch | \geq (π / 3) \\ | yaw | \geq (π / 3) \end{cases}

Algorithm flow

In summary, the dynamics control process of the humanoid CDHR based on the DC-PID-TD3 algorithm is shown as follows:

Algorithm 3.

DC-PID-TD3 algorithm update process.

Input: Environment e, State space s, Action space a, Constant matrixs K_P , K_I , K_D ,
Expected trajectory c, Desired orientation n

1 Initial empty replay buffer D;

2 Initial parameters of the policy network and evaluation network;

3 for Episode ≤ MaxEpisode do

4 Reset environment;

5 for each environment step do

6 Observe

θ_{r i}

{\dot{θ}}_{r i}

{\ddot{θ}}_{r i}

,i=1,2,3;

7 Calculate

q_{r i}

{\dot{q}}_{r i}

{\ddot{q}}_{r i}

,i=1,2,3;

8 Calculate

θ_{d i}

,i=1,2,3 using equations (3) to (13) given c;

9 Calculate

{\dot{θ}}_{d i}

{\ddot{θ}}_{d i}

,i=1,2,3 using equation (36);

10 Calculate

q_{d i}

{\dot{q}}_{d i}

{\ddot{q}}_{d i}

,i=1,2,3 using equations (25) to (31);

11 Calculate

τ_{i}

,i=1,2,3 using equation (38) given K_P ,K_I ,K_D ;

12 Observe state s_t and sample action from policy

a_{t} \sim π_{Φ} (a_{t} | s_{t})

;

13 Calculate a_t , execute action a_t , the interaction with the environment yields the reward r_t and the new state

s_{t + 1}

s_{t + 1}

as the state value for the next time step;

14 Store transition (s_t ,a_t ,r_t ,

s_{t + 1}

) in the replay bufferD;

15 end for

16 if it’s time to update then

17 for each gradient step do

18 Sample a batch of transition B =

{(s_{t}, a_{t}, r_{t}, s_{t + 1})}

fromD;

19 Calculate y_t using equation (39);

20 Update

θ_{1}^{Q}

and

θ_{2}^{Q}

using equation (40);

- Q (s_{t}, a_{t})

is used as a loss function for backpropagation to update parameter

θ^{μ}

;

22 Update

θ^{μ^{'}}

θ_{1}^{Q^{'}}

, and

θ_{2}^{Q^{'}}

using equation (41);

23 end for

24 end for

25 end for

26 Output: Torque

τ_{i}

,i=1,2,3,4,5,6.

Simulation verification

In this article, we use Simulink as the simulation environment, in which the DC-PID control and TD3 reinforcement learning training are carried out simultaneously. The control block diagram is shown in Figure 18, the desired trajectory equation is shown in equations (33) and (34), and the desired orientation is that the normal vector n of the moving platform is always perpendicular to the ground.

Figure 18.

Control system block diagram.

Simulation parameter setting

The DC-PID-TD3 parameters are shown in Table 3. The proportional gain K_P controls the proportional relationship between the feedback signal and the error signal of the PID controller and is used to adjust the system’s response speed and stability. Increasing K_P can increase the system’s sensitivity and response speed, but it may also lead to oscillations and overcorrection. The integral gain K_I controls the integral part of the error signal in the PID controller and is used to eliminate static errors and biases in the system, thereby improving stability and accuracy. Increasing K_I can speed up the system’s stabilization process, but it may also lead to overshoot and oscillations. The derivative gain K_D controls the derivative part of the error signal in the PID controller and is used to predict the system’s future trend, thereby improving response speed and suppressing oscillations. Increasing K_D can reduce overshoot and oscillations in the system, but it may also lead to increased noise and sensitivity. B refers to the number of samples randomly drawn from the experience replay buffer each time the model is updated during iterations. A larger mini-batch size typically helps to obtain a more accurate estimation of gradients, thereby accelerating the convergence speed of training. On the other hand, a smaller mini-batch size sometimes leads to better generalization performance. E controls the number of training iterations the model undergoes over the entire data set. More epochs are beneficial for improving the model’s generalization ability. However, excessive epochs may lead to overfitting. Mcontrols how many times the model sees the data within each epoch. More steps per epoch help to accelerate the training speed and improve the model’s convergence. However, too many steps per epoch may lead to overfitting to the training data and make the model more susceptible to noise in the training data. The role of T_n is to influence the perception of dynamic changes in the environment. A smaller sample time can provide more frequent feedback, allowing the algorithm to explore the environment more quickly and make real-time adjustments. On the other hand, a larger sample time can reduce computational burden, decrease hardware resource requirements, and improve the efficiency of data sampling. n refers to the number of hidden layers in the neural network. Having more hidden layers can give the neural network a stronger fitting ability, enabling it to learn more complex function relationships. However, having too many hidden layers might also lead to overfitting, increased training time, and computational resource requirements. s specifies the number of neurons in each hidden layer. A larger size of hidden layers can increase the fitting ability of the neural network, enabling it to better learn and adapt to complex data patterns. However, an excessively large size of hidden layers might lead to overfitting, especially when dealing with a small data set or in the presence of significant noise. D is a data structure used to store the experience data generated from the interaction between the agent and the environment. A larger replay buffer can store more experience data, helping to reduce the correlation between samples. However, a larger replay buffer will increase memory consumption and lead to longer training times. $l_{r 1}$ and $l_{r 2}$ refer to the rates used to update and adjust the parameters of the critic and actor neural networks, respectively. A smaller learning rate can result in smaller parameter updates, making the training process more stable, but it may also require more training time. On the other hand, a larger learning rate may lead to model instability, including oscillations or failure to converge. The discount factor $γ$ serves to balance the importance of short-term rewards and long-term rewards. A smaller discount factor emphasizes current rewards, while a larger discount factor pays more attention to future rewards. By adjusting the discount factor, the agent can be made to prioritize either immediate rewards or long-term returns. The parameter τ balances the stability and flexibility of the target network. A smaller soft update factor makes the updates to the target network smoother and more stable, helping to slow down the parameter changes and improve training stability. Conversely, a larger soft update factor causes the target network to approach the current network more quickly, which can speed up training but may also introduce some instability. $λ_{i}$ is used to adjust the magnitude of the reward. Through repeated debugging and experiments on the simulation model, we obtained the optimal parameters as shown in Table 3.

Table 3.

Description of the parameters used by DC-PID-TD3.

Parameters	Instructions	Value
K_P	Proportional coefficient	diag{15, 15, 15}
K_I	Integral coefficient	diag{0.5, 0.5, 0.5}
K_D	Differential coefficient	diag{1, 1, 1}
B	Mini-batch size	256
E	Number of epoch	2000
M	Steps per epoch	200
T_s	Sample time	0.15
T_f	Simulation end time	30
n	Number of hidden layers	2
s	Size of hidden layer	512
D	Replay buffer size	1.08e6
lr1	Learning rate of critic	1e-3
lr2	Learning rate of actor	1e-3
γ	Discount factor	0.99
τ	Soft update factor	0.001
λ ₁	Reward factor 1	−10
λ ₂	Reward factor 2	2
λ ₃	Reward factor 3	5
λ ₄	Reward factor 4	10
λ ₅	Reward factor 5	25
λ ₆	Reward factor 6	−0.02

To further validate the effectiveness of the DC-PID-TD3 controller, we also conducted simulations using a simple conventional PID controller, with the PID controller parameters shown in Table 4.

Table 4.

The parameters used by conventional PID.

Parameters	Instructions	Value
K_P	Proportional coefficient	diag{10, 10, 10}
K_I	Integral coefficient	diag{0.3, 0.4, 0.4}
K_D	Differential coefficient	diag{0.8, 0.8, 0.8}

Simulation result

As shown in Figure 19, Figure 19(a) and (b) depicts the trajectory tracking and error using the DC-PID control, and the maximum error of X-axis is 0.26 mm, Y-axis is 0.29 mm, and Z-axis is 0.02 mm. Figure 19(c) to (e) depicts the training results (average reward) using TD3 DRL, the trained orientation variation and the orientation change of end-effector, and the maximum error of roll is $1.26 °$ , pitch is $1.20 °$ , and yaw is $2.46 °$ . Figure 19(f) and (g) depicts the output torque of the DC-PID-TD3 control method, respectively. The simulation results using a simple conventional PID controller shown in Figure 19(h) indicate a maximum rolling error of $4.58 °$ , pitching error of $7.44 °$ , and yawing error of $5.90 °$ . The above simulation results verify that the DC-PID-TD3 control method proposed in this article has fast convergence and good performance, and it is innovative to apply it to a humanoid CDHR to solve the problems of complexity, nonlinearity, coupling, and uncertainty of dynamics parameters.

Figure 19.

DC-PID-TD3 simulation results graph: (a) trajectory tracking diagram, (b) trajectory error diagram, (c) training reward chart, (d) orientation change chart, (e) diagram of orientation change, (f) tandem joint drive torque diagram, (g) parallel joint drive torque diagram (normalized), and (h) diagram of orientation change (conventional PID).

Experimental evaluation and validation

Based on the previous kinematic analysis, coupling analysis, and control algorithm simulation, the experimental object is the humanoid CDHR, and the relevant experiments are verified based on TwinCAT3 and Matlab software platform.

Experimental setting

This experimental platform is mainly composed of the upper computer, electrical control system, robot body, and motion capture system. The upper computer is implemented by TwinCAT3 and Matlab software platform. Matlab is responsible for the implementation of the motion control algorithm, and TwinCAT3 is responsible for the control of the lower computer. Electrical control system includes controller, driver, IO module, servo motor, ball screw, and so on. The robot is the humanoid CDHR prototype. The motion capture system is to collect and detect the actual trajectory of the robot using Nokov camera, compare it with the theoretical trajectory, and verify the effectiveness of the decoupling method and control algorithm. Figure 20 shows the construction of the humanoid CDHR prototype.

Figure 20.

The humanoid CDHR prototype. CDHR: cable-driven hybrid robot.

Experimental analysis

Experimental validation of the decoupling method

To verify the feasibility and effectiveness of the decoupling method in practical application, the first three joints of the cable-driven hybrid robotic arm experimental prototype are trajectory planned, and each joint is rotated according to the motion law in Table 5, and the experiments are conducted in the coupled and decoupled states, respectively, and the turning angles of each joint are collected by the motion capture system and compared with the desired turning angles. Figure 21 shows the experimental verification diagram of the decoupling method.

Table 5.

Robot joint angle information.

t(s)	24	72
Joint 1(deg)	10	−10
Joint 2(deg)	30	−30
Joint 3(deg)	−30	30
Joint 4(deg)	0	0
Joint 5(deg)	0	0
Joint 6(deg)	0	0

Figure 21.

Experimental validation of the decoupling method: (a) angle 1,2,3 changes in two states, (b) angle 4,5,6 changes in two states, (c) error in the coupling state, and (d) error in the decoupling state.

From the above experimental results, we can see that in the coupled state, the maximum error of six joint rotation is about $10 °$ , in which joint 3 is most seriously affected by the coupling, with a maximum error of $10.27 °$ , while the parallel joints are less affected by the coupling, with a maximum error of $4.60 °$ , $2.27 °$ , and $1.07 °$ , respectively. This is due to the reverse motion of joint 2 and joint 3 causing the coupling effects to cancel each other out, and the parallel joints are only affected by the coupling effects caused by joint 1. After decoupling, the maximum error in series joint rotation is $1.15 °$ , and the maximum error in parallel joint rotation is $1.54 °$ . Compared with the average maximum angular error of $3.97 °$ in the coupled state, the average maximum angular error in the decoupled state is reduced by 77.58% to $0.89 °$ , which fully demonstrates the effectiveness of the decoupling method.

Experimental verification of DC-PID control

In this paper, the PID control based on the decoupling method is experimentally verified, and the mutually independent position information of each joint is obtained on the basis of the inverse kinematic solution and decoupling method, then PID control is adopted to realize the trajectory tracking of the end, and the trajectory equation as Eq. 33 - 34. The SDK of Nokov motion capture system is used to provide real-time feedback of the series joint position information and the end-effector position information, and to compare the actual trajectory with the desired trajectory.

As shown in Figure 22, the maximum errors of open-loop trajectory tracking are 1.56 mm, 0.85 mm, 1.27 mm. The maximum errors of PID control trajectory tracking are 1.38 mm, 0.55 mm, 0.76 mm. Compared with open-loop experiment, The maximum error of trajectory tracking is reduced by 11.54%, 35.29% and 40.16%, respectively, which prove the good performance of the DC-PID control method.

Figure 22.

Experimental verification of control method: (a) track tracking, (b) open-loop tracking error, and (c) DC-PID tracking error.

Multibody model of the humanoid CDHR is built based on the Simscape modeling environment, and the physical simulation is visualized to provide the environment basis for the later paper.

Secondly, the kinematic analysis of the robot is carried out, and for the joint coupling phenomenon caused by multijoint cable-driven, the decoupling method is proposed through joint coupling analysis, so that the motion of each joint does not affect each other, and accurate motion results are obtained, and the kinematic model is established.

Thirdly, to address the problems of complex nonlinearity, coupling, uncertainty of dynamics parameters, and considering the cable forces on the humanoid CDHR system, this article combines the hybrid characteristics of the robot and proposes a control method combining the DC-PID control and TD3 DRL algorithm to realize the trajectory tracking control and orientation training.

Finally, the prototype of the humanoid CDHR is built, and the verification experiments of the decoupling method and the control method are carried out, respectively, and the feasibility and effectiveness of the decoupling method and the control method are proved by the analysis of the experimental results.

In the future, further research will be conducted to improve the dynamics model of the humanoid CDHR and to establish a model-based intelligent control method from flexibility to meet the actual human–machine collaboration requirements.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the National Key Research and Development Program of China (No. 2022YFB4702501), the National Natural Science Foundation of China (No. 52175013), and Key Science and Technology Special Project of Anhui Province (202203a05020007).

ORCID iDs

Sen Qian

Zhe Wu

References

Fabritius

Miermeister

Kraus

, et al. A framework for analyzing the accuracy, complexity, and long-term performance of cable-driven parallel robot models. Mech Mach Theory 2023; 185: 105331. DOI: 10.1016/j.mechmachtheory.2023.105331.

Yang

Zhang

, et al. Development and evaluation of a space robot prototype equipped with a cable-driven manipulator. Acta Astronautica 2023; 208: 142–154. DOI: 10.1016/j.actaastro.2023.04.014.

Darvish

Simetti

Mastrogiovanni

, et al. A hierarchical architecture for human–robot cooperation processes. IEEE Trans Robot 2021; 37(2): 567–586. DOI: 10.1109/TRO.2020.3033715.

Yang

Chen

Ding

, et al. Stiffness modeling and distribution of a modular cable-driven human-like robotic arm. Mech Mach Theory 2023; 180: 105150. DOI: 10.1016/j.mechmachtheory.2022.105150.

Zhou

Zheng

Chen

, et al. Dynamics modeling and analysis of cable-driven segmented manipulator considering friction effects. Mech Mach Theory 2022; 169: 104633. DOI: 10.1016/j.mechmachtheory.2021.104633.

Zhang

Shao

Wang

. Optimization and implementation of a high-speed 3-dofs translational cable-driven parallel robot. Mech Mach Theory 2020; 145: 103693. DOI: 10.1016/j.mechmachtheory.2019.103693.

Fabritius

Rubio-Gómez

Martin

, et al. A nullspace-based force correction method to improve the dynamic performance of cable-driven parallel robots. Mech Mach Theory 2023; 181: 105177. DOI: 10.1016/j.mechmachtheory. 2022.105177.

Iturralde

Feucht

Illner

, et al. Cable-driven parallel robot for curtain wall module installation. Autom Constr 2022; 138: 104235. DOI: 10.1016/j.autcon.2022.104235.

Peng

Liu

, et al. Workspace, stiffness analysis and design optimization of coupled active-passive multilink cable-driven space robots for on-orbit services. Chinese J Aeronaut 2023; 36(2): 402–416. DOI: 10.1016/j.cja.2022.03. 001.

10.

Ben Hamida

Laribi

Mlika

, et al. Multi-objective optimal design of a cable driven parallel robot for rehabilitation tasks. Mech Mach Theory 2021; 156: 104141. DOI: 10.1016/j.mechmachtheory.2020.104141.

11.

Seyfi

Keymasi Khalaji

. Robust control of a cable-driven rehabilitation robot for lower and upper limbs. ISA Trans 2022; 125: 268–289. DOI: 10.1016/j.isatra.2021.07.016.

12.

Sanjuan

Castillo

Padilla

, et al. Cable driven exoskeleton for upper-limb rehabilitation: a design review. Robot Auton Syst 2020; 126: 103445. DOI: 10.1016/j.robot.2020.103445.

13.

Mahmoodabadi

Nejadkourki

. Adaptive proportional-integral-derivative control for surgical plane cable-driven robots. Inform Med Unlocked 2022; 31: 100992. DOI: 10.1016/j.imu.2022.100992.

14.

Lalithkumar

Cai

Ramachandra

, et al. Chapter 7 – tendon routing and anchoring for cable-driven single-port surgical manipulators with spring backbones and luminal constraints. In Ren

(ed.) Flexible Robotics in Medicine. Academic Press. ISBN 978-0-12-817595-8, 2020. pp. 169–194. DOI: 10.1016/B978-0-12-817595-8.00007-9.

15.

Liu

Wang

, et al. A cable-pulley transmission mechanism for surgical robot with backdrivable capability. Robot Comput-Integr Manuf 2018; 49: 328–334. DOI: 10.1016/j.rcim.2017.08.011.

16.

Song

Kim

Yoon

, et al. Development of low-inertia high-stiffness manipulator lims2 for high-speed manipulation of foldable objects. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), Madrid, Spain, 01–05 October 2018, pp. 4145–4151. IEEE. DOI: 10.1109/IROS.2018.8594005.

17.

Kim

, et al. Bio-inspired cable-driven knee orthosis for tibiofemoral joint load distribution. IFAC-PapersOnLine 2022; 55(27): 430–435. DOI: 10.1016/j.ifacol.2022.10.550. 9th IFAC Symposium on Mechatronic Systems MECHATRONICS 2022.

18.

Peng

Yang

, et al. Dynamic modeling and trajectory tracking control method of segmented linkage cable-driven hyper-redundant robot. Nonlinear Dyn 2020; 101(1): 233–253. DOI: 10.1007/s11071-020-05764-7.

19.

Xie

Shang

Zhang

, et al. High-precision trajectory tracking control of cable-driven parallel robots using robust synchronization. IEEE Trans Industr Inform 2021; 17(4): 2488–2499. DOI: 10.1109/TII.2020.3004167.

20.

Yang

Song

. Performance-based hybrid control of a cable-driven upper-limb rehabilitation robot. IEEE Trans Biomed Eng 2021; 68(4): 1351–1359. DOI: 10.1109/TBME.2020.3027823.

21.

Fareh

Al-Shabi

Bettayeb

, et al. Robust active disturbance rejection control for flexible link manipulator. Robotica 2020; 38(1): 118–135. DOI: 10.1109/TBME.2020.3027823.

22.

Gjærum

Strümke

Løver

, et al. Model tree methods for explaining deep reinforcement learning agents in real-time robotic applications. Neurocomputing 2023; 515: 133–144. DOI: 10.1016/j.neucom.2022.10.014.

23.

Liu

Zhang

Tang

, et al. A mixed perception-based human-robot collaborative maintenance approach driven by augmented reality and online deep reinforcement learning. Robot Comput-Integr Manuf 2023; 83: 102568. DOI: 10.1016/j.rcim.2023.102568.

24.

Peng

Liu

Pan

, et al. Model-based deep reinforcement learning for data-driven motion control of an under-actuated unmanned surface vehicle: path following and trajectory tracking. J Frank Inst 2023; 360(6): 4399–4426. DOI: 10.1016/j.jfranklin.2022.10.020.

25.

Yang

Yuan

. Dynamic path planning for mobile robots with deep reinforcement learning. IFAC-PapersOnLine 2022; 55(11): 19–24. DOI: 10.1016/j.ifacol.2022.08.042. IFAC Workshop on Control for Smart Cities CSC 2022.

26.

Hadi

Khosravi

Sarhadi

. Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle. Appl Ocean Res 2022; 129: 103326. DOI: 10.1016/j.apor.2022.103326.

27.

Zhao

Tao

Qian

, et al. Model-based actor-critic learning for optimal tracking control of robots with input saturation. IEEE Trans Industr Electron 2021; 68(6): 5046–5056. DOI: 10.1109/TIE.2020.2992003.

28.

Zhao

Han

Tao

, et al. Model-based actor-critic learning of robotic impedance control in complex interactive environment. IEEE Trans Industr Electron 2022; 69(12): 13225–13235. DOI:10.1109/TIE.2021.3134082.

29.

Zhong

Wang

Cheng

. Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics. Complex Intell Syst 2022; 8(3): 1899–1912. DOI:10.1007/s40747-021-00366 -1.

30.

Wagaa

Kallel

Mellouli

. Analytical and deep learning approaches for solving the inverse kinematic problem of a high degrees of freedom robotic arm. Eng Appl Artif Intell 2023; 123: 106301. DOI: 10.1016/j.engappai.2023.106301.

31.

Zheng

Wang

Yang

, et al. An efficiently convergent deep reinforcement learning-based trajectory planning method for manipulators in dynamic environments. J Intell Robot Syst 2023; 107(4): 50. DOI:10.1007/ s10846-023-01822-5.

32.

Sun

Zhang

, et al. Event-triggered reconfigurable reinforcement learning motion-planning approach for mobile robot in unknown dynamic environments. Eng Appl Artif Intell 2023; 123: 106197. DOI: 10.1016/j.engappai.2023.106197.

33.

Zidane

Khattab

El-Habrouk

, et al. Trajectory control of a laparoscopic 3-PUU parallel manipulator based on neural network in Simscape Simulink environment. Alex Eng J 2022; 61(12): 9335–9363. DOI: 10.1016/j.aej. 2022.03.024.

34.

Zou

. Comparison of 3-DOF asymmetrical spherical parallel manipulators with respect to motion/force transmission and stiffness. Mech Mach Theory 2016; 105: 369–387. DOI: 10.1016/j.mechmachtheory.2016.07.017.

35.

Tursynbek

Shintemirov

. Infinite rotational motion generation and analysis of a spherical parallel manipulator with coaxial input axes. Mechatronics 2021; 78: 102625. DOI: 10.1016/j.mechatronics.2021.102625.

36.

Caro

Bai

, et al. Dynamic modeling and design optimization of a 3-DOF spherical parallel manipulator. Robot Auton Syst 2014; 62(10): 1377–1386. DOI: 10.1016/j.robot.2014.06.006.

37.

Rad

Tamizi

Azmoun

, et al. Experimental study on robust adaptive control with insufficient excitation of a 3-DOF spherical parallel robot for stabilization purposes. Mech Mach Theory 2020; 153: 104026. DOI: 10.1016/j.mechmachtheory.2020.104026.

38.

Sun

Song

, et al. Multi-objective solution of optimal power flow based on TD3 deep reinforcement learning algorithm. Sustainable Energy, Grids Networks 2023; 34: 101054. DOI: 10.1016/j.segan.2023.101054.

39.

Shi

Lam

Xuan

, et al. Adaptive neuro-fuzzy PID controller based on twin delayed deep deterministic policy gradient algorithm. Neurocomputing 2020; 402: 183–194. DOI: 10.1016/j.neucom.2020.03.063.

40.

Zhang

Dong

. Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach. Appl Soft Comput 2022; 115: 108194. DOI: 10.1016/j.asoc.2021.108194.

Deep reinforcement learning and decoupling proportional-integral-derivative control of a humanoid cable-driven hybrid robot

Abstract

Keywords

Introduction

Mechanical design and modeling

Kinematics analysis and simulation

Kinematics analysis of 3-DOF series mechanism

Kinematics analysis of 3-DOF coaxial SPM

Kinematic coupling analysis of joints

Pitch joint coupling analysis

Rotary joint coupling analysis

Kinematic simulation

Verification of decoupling method

Kinematic verification

Control method design

PID control based on decoupling method

TD3 deep reinforcement learning algorithm

State and action space design

Reward function and termination condition design

Algorithm flow

Simulation verification

Simulation parameter setting

Simulation result

Experimental evaluation and validation

Experimental setting

Experimental analysis

Experimental validation of the decoupling method

Experimental verification of DC-PID control

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References