Sage Journals: Discover world-class research

Abstract

Redundant manipulators are flexible enough to adapt to complex environments, but their controller is also required to be specific for their extra degrees of freedom. Inspired by the morphology of snakes, we propose a path planning algorithm named Swinging Search and Crawling Control, which allows the snake-like redundant manipulators to explore in complex pipeline environments without collision. The proposed algorithm consists of the Swinging Search and the Crawling Control. In Swinging Search, a collision-free manipulator configuration that of the end-effector in the target point is found by applying reinforcement learning to self-motion, instead of designing joint motion. The self-motion narrows the search space to the null space, and the reinforcement learning makes the algorithm use the information of the environment, instead of blindly searching. Then in Crawling Control, the manipulator is controlled to crawl to the target point like a snake along the collision-free configuration. It only needs to search for a collision-free configuration for the manipulator, instead of searching collision-free configurations throughout the process of path planning. Simulation experiments show that the algorithm can complete path planning tasks of hyper-redundant manipulators in complex environments. The 16 degrees of freedom and 24 degrees of freedom manipulators can achieve 83.3% and 96.7% success rates in the pipe, respectively. In the concentric pipe, the 24 degrees of freedom manipulator has a success rate of 96.1%.

Keywords

Redundant manipulator path planning reinforcement learning self-motion Crawling Control Swinging Search

Introduction

In recent decades, more and more researchers have been involved in the study of redundant manipulators to explore complex narrow environments, such as the natural gas pipes, sewer pipes, and tailpipes of airplanes. A manipulator is termed kinematically redundant when it possesses more degrees of freedom (DoFs) than it is needed to execute a given task.¹ The extra DoFs allow the manipulator to perform complex tasks, making path planning a challenge though.

To address the path planning of redundant manipulators, some early researchers proposed geometric methods. Chirikjian and Burdick² proposed the backbone curve approach for hyper-redundant robot kinematics. Later, a shape control geometric analysis method³ was proposed by Mochiyama to control the whole shape of a manipulator. But such algorithms are computationally intensive. To reduce computational complexity, Yahya et al.⁴ proposed geometric constraints that limit the angles between the adjacent links to be equal. However, the flexibility of the manipulators is also limited due to the introduced constraints.

Other researchers proposed the method based on optimization theory, in which obstacle avoidance can be achieved by optimizing the objective distance function between the manipulators and the obstacles. Zucker et al.⁵ proposed the covariant Hamiltonian optimization for notion planning method which can quickly converge with generating a smooth collision-free trajectory. Zhao et al.⁶ applied the ant colony algorithm to plan the collision-free optimal path of the end-effector. Collins and Shen⁷ proposed an optimization framework path planning with swarm optimization to efficiently calculate the approximate solution of the collision-free path planning. Artificial potential field method is a classic algorithm for path planning of mobile robots, which can also be applied to the planning of manipulators.^8,9 In this algorithm, the obstacle produces repulsive virtual force and the target point generates attractive virtual force to the point, the resultant force of them push this point to achieve a collision-free path toward the target. These methods require calculating the distance from the manipulator to the obstacles and therefore also require modeling the surface of the obstacles, which is sometimes difficult to complete.

In addition, the random sampling methods are efficient path planning algorithms for redundant manipulators. The probabilistic roadmaps (PRM) method¹⁰ proposed by Kavraki et al. randomly samples in the configuration space to search for collision-free configurations.¹¹ The sampling results are stored in a PRM. For any given start and goal configurations of the manipulator, the motion planning can be transformed into the problem of connecting the two corresponding nodes in the roadmap.¹² Compared with the optimization theory methods, the random sampling methods do not require figuring out the objective distance function, which reduces the computational complexity and improves the algorithm applicability. However, as the DoFs increase, methods of this kind are also computationally complex.

Facing the unknown environment in application, Um and Ryu¹³ proposed RRT-Cabomba algorithm based on the sampling method RRT¹⁴ which controls the IPA sensitive skin equipped robot to perceive and plan alternately. Further, they also adopted the bug algorithm in RRT-Cabomba to implement local minima avoidance.

With the development of artificial intelligence, applying genetic algorithm in the obstacle avoidance of redundant manipulators^15,16 is another method for solving the optimization problems in planning, without requiring to calculate the distance from the manipulator to the obstacles. As a search process, the genetic algorithm copies and mutates the parameters, and then filters them to find the parameters that make the fitness function better, so as to control the manipulator to avoid obstacles. Reinforcement learning (RL) has also been applied to the path planning of redundant manipulators. Hua et al.¹⁷ applied the RL method to train the agent to plan the end-effector motion and self-motion^18
–20 separately. Combining the gradient projection method²¹ decouples the motion of redundant manipulators into end-effector motion and self-motion. In this algorithm, the agent has to learn the entire planning process without making full use of the nature of self-motion, which still causes great computational difficulties.

In nature, snakes crawl along the path of their heads during crawling. Their bodies will not collide with the environment, as long as the heads can pass smoothly. Inspired by this biological phenomenon, this article proposes a path planning algorithm named Swinging Search and Crawling Control (SSCC) for snake-like redundant manipulators. This algorithm consists of the Swinging Search and the Crawling Control, allowing the redundant manipulators with repetitive modules to explore the complex pipeline environments.

By self-motion, the Swinging Search algorithm generates collision-free configurations as the directive configurations, in which the end-effector is in the target point. And then the manipulator can crawl into the pipe along the directive configurations by the Crawling Control. During crawling, each modular link repeats the motion of the foremost module.

The major contributions of this article are as follows.

Inspired by the biological characteristics of snakes, this article innovatively proposes the SSCC algorithm for snake-like redundant manipulators to be adapted to narrow environments such as pipes.

The SSCC algorithm has low computational complexity, since only one directive configuration needs to be searched.

The SSCC algorithm has good expansibility and adaptability for modular snake-like redundant manipulators with n universal joints.

Snake-like redundant manipulators

In this section, each subsection presents the mechanical structure description, the forward kinematics, and the self-motion of the snake-like redundant manipulators in turn. The formulation presented here is to elaborate on what the manipulators look like and how to control them.

Mechanical structure design

In this article, to be analogous to snakes in morphology, the modular redundant manipulators with n universal joints are designed. The manipulators are $n \times U$ open chains.²² Each module consists of a universal joint and a fixed-length rigid body link, as shown in Figure 1. Modularity allows the DoFs of the manipulators to be easily expanded by increasing the number of modules. The Denavit–Hartenberg (D-H) parameters of the snake-like redundant manipulators are shown in Table 1.

Figure 1.

The structure of the general snake-like redundant manipulator with 12 joints.

Table 1.

The D-H parameters of the snake-like redundant manipulators.

Joint n	$α_{i}$ (°)	$a_{i}$ (m)	$d_{i}$ (m)	$θ_{i}$ (°)
1	90	0	0	0
2	−90	l	0	0
3	90	0	0	0
4	−90	l	0	0
⋮	⋮	⋮	⋮	⋮
2n − 1	90	0	0	0
2n	−90	l	0	0

D-H: Denavit–Hartenberg.

By the D-H parameters, the homogeneous transformation matrices for kinematics are expressed as

\begin{array}{l} T_{i}^{i + 1} = [\begin{matrix} cos θ_{i} & - sin θ_{i} cos α_{i} & sin θ_{i} sin α_{i} & a_{i} cos θ_{i} \\ sin θ_{i} & cos θ_{i} cos α_{i} & - cos θ_{i} sin α_{i} & a_{i} sin θ_{i} \\ 0 & sin α_{i} & cos α_{i} & d_{i} \\ 0 & 0 & 0 & 1 \end{matrix}] \\ i = 0, 1, 2, \dots,2 n - 1 \end{array}

Forward kinematics

In forward kinematics, the pose of the end-effector $x = {[\begin{matrix} x & y & z & α & β & γ \end{matrix}]}^{T}$ , in which the first three and the last three variables represent the position and the orientation of the manipulator respectively. The pose $x$ can be calculated as follows

x = f_{fkine} (q) = \prod_{i = 1}^{2 n} T_{i}^{i + 1}

where $f_{fkine}$ is the function of the forward kinematics, $q = {[\begin{matrix} θ_{1} & θ_{2} & \dots & θ_{2 n} \end{matrix}]}^{T}$ is the joint motion vector of the manipulators in the configuration space.

Self-motion

The i-th column of the Jacobian matrix $j_{i}$ can be calculated by the vector product method as follows

j_{i} = [\begin{matrix} z_{i} \times^{i} ​ p_{E}^{0} \\ z_{i} \end{matrix}]

where $z_{i}$ is the i-th axis vector and $^{i} p_{E}^{0}$ is the vector from the end-effector to the i-th joint. Then the Jacobian matrix can be represented as

J = [\begin{matrix} j_{1} & j_{2} & \dots & j_{2 n} \end{matrix}]

The pseudo inverse of Jacobian matrix is defined as

J^{†} = J^{T} {(J J^{T})}^{- 1}

With the Jacobian matrix, the end-effector velocity vector $\dot{x}$ can be expressed as a function of the joint velocity vector $\dot{q}$

\dot{x} = [\begin{matrix} v \\ w \end{matrix}] = J \dot{q}

where $v$ and $w$ are the linear velocity vector and the angular velocity vector of the end-effector, respectively.

For control purposes, it is common to solve $\dot{q}$ in equation (6) by specifying the desired $\dot{x}$ . Since the manipulators are redundant, the dimension of $\dot{q}$ is higher than the dimension of $\dot{x}$ . In this way, equation (6) can be regarded as a set of underdetermined equations, and the solution of which can be formalized as follows

\begin{array}{l} \dot{q} & = {\dot{q}}_{S} + {\dot{q}}_{N} \\ = J^{†} \dot{x} + (I - J^{†} J) M \end{array}

where the term $J^{†} \dot{x}$ is the minimum norm solution ${\dot{q}}_{S}$ of equation (6). The term $(I - J^{†} J) M$ represents the homogeneous solutions ${\dot{q}}_{N}$ of $J \dot{q} = 0$

\begin{array}{l} J {\dot{q}}_{N} & = J (I - J^{†} J) M \\ = (J - J J^{†} J) M \\ = 0 \end{array}

All the homogeneous solutions form the null space of $J$ . In the null space, redundant manipulators can change their configurations without changing the pose of the end-effector which is named the self-motion. The self-motion can be used in the gradient projection method²¹ to realize the obstacle avoidance of redundant manipulators

\begin{matrix} M = k \nabla H (q) \\ = k {[\begin{matrix} \frac{\partial H (q)}{\partial q_{1}} & \frac{\partial H (q)}{\partial q_{2}} & \dots & \frac{\partial H (q)}{\partial q_{2 n}} \end{matrix}]}^{T} \end{matrix}

where k is a scale factor, $H$ is the objective function which represents the distance between the manipulator and obstacles, and $\nabla H$ is the gradient of $H$ . Then equation (7) can be derived as follows

\begin{array}{l} \dot{q} & = J^{†} \dot{x} + k (I - J^{†} J) \nabla H \end{array}

By specifying $\dot{x}$ and calculating $\nabla H$ , the control of redundant manipulators can be achieved by the following iterative equation

q (t + Δ t) = q (t) + \dot{q} Δ t

where $q (t)$ is the current joint positions and $q (t + Δ t)$ is the joint positions after an iteration in the next moment.

Framework of SSCC

In this article, a path planning algorithm named SSCC is proposed. Through this algorithm, the modular redundant manipulators can reach into the complex pipe environment without collision, allowing the end-effector to achieve the goal pose.

SSCC algorithm consists of two units, the Swinging Search and the Crawling Control. The Swinging Search generates collision-free configurations by applying RL to self-motion. The generated configurations are named directive configurations, in which the end-effector has reached the goal pose. In the Crawling Control, the controlled redundant manipulator crawls into the pipe along the directive configuration like a snake. The flowchart of SSCC is shown in Figure 2.

Figure 2.

The framework of Swinging Search and Crawling Control.

In Figure 2, s is the state of the environment fed back to the agent; $\nabla H$ is the action the agent decides to perform, which is the gradient in equation (9) of the objective function that represents the distance between the obstacle and the manipulator; r is the reward obtained by the agent. The above parameters will be described in detail in the “State–action design” section.

Since the length of the rigid link in the module cannot be ignored, the manipulator will inevitably deviate from the directive configuration during the Crawling Process, which may also cause the manipulator to collide. The Crawling Process is deterministic, which means that the degree of deviation can be calculated based on the directive configuration. Therefore, we propose the maximum deviation distance to evaluate the directive configuration to train the agent in RL, so that the Swinging Search can generate the directive configuration with a smaller maximum deviation distance to reduce the possibility of collision during crawling.

Both the Swinging Search and the Crawling Control include collision detection. In the Swinging Search, collision detection is used to determine whether to terminate the search. In the Crawling Control, if a collision is detected, then the directive configuration is invalid, and the Swinging Search should regenerate a new directive configuration.

Crawling Control

The Crawling Process refers to the process in which the manipulator M is controlled to move forward along the directive configuration $M (q_{d})$ . Through this control, the end-effector of the manipulator can reach the specified pose.

The flow of the Crawling Process is shown in the subsection “Crawling Process,” which specifies how the manipulator is controlled given a certain directive configuration.

Then in the subsection “Evaluation of directive configurations,” the evaluation method is proposed, in which the maximum deviation distance is designed to measure the deviation of the controlled manipulator from the directive configuration. For a specific configuration, the Crawling Process produces a unique control process. So the maximum deviation distance is the evaluation indicator of directive configurations. The smaller the maximum deviation distance, the better the directive configuration. This property can be used in the reward function of RL in Swinging Search, to help generate better directive configurations, so as to reduce the possibility of collisions in the Crawling Process.

Crawling Process

To illustrate the Crawling Process, several definitions are made as follows, and structures are shown in Figure 3.

The modules of $M (q_{d})$ that are completely or partially inside the pipe are called the inner directive modules.

The inner directive module at the entrance of the pipe is called the critical module. The two ends of the critical module are denoted as $p_{1}$ and $p_{2}$ .

Similarly, the modules of the crawling manipulator M that are completely or partially inside the pipe are called the inner modules.

The remaining modules of M are called the base modules. All the base modules form the base manipulator $M_{base}$ .

Figure 3.

The structures of (a) the directive configuration and (b) the crawling manipulator. The solid blue polygonal line represents the crawling manipulator. The dashed orange polygonal line represents the directive configuration.

In the Crawling Process, the base modules and the inner modules of M are controlled separately, as shown in Figure 4. By solving the inverse kinematics, the end of $M_{base}$ is controlled to move from $p_{1}$ to $p_{2}$ within time T, while keeping the orientation of the foremost module the same as that of the critical module. At the same time, the universal joint angles of the inner modules are concurrently changed to the corresponding joint angles of their next inner directive module. In this way, each inner module will repeat the motion of the foremost inner module, like the movement of a snake. In detail, suppose the number of modules to be inserted is N and the number of modules inserted is i, then the angles of the j-th inner module should be adjusted to the angles of the (N – i + j – 1)-th inner directive module.

As the modules move forward, the base modules in the front will become the inner modules, and the control over them will be changed, too.

For the next round of insertion, the new $M_{base}$ should adjust the orientation of its end, and at the same time, the newly inserted module should keep overlapping the critical module. This can be achieved through pose interpolation of $M_{base}$ , while planning the first inner module to move to the last inner directive module. The Crawling Process algorithm is summarized in Algorithm 1.

Figure 4.

The base manipulator moves along the direction of the critical module from point p1 until it reaches p2, as shown in Figure (b). Then adjust the direction of the end-effector of the base manipulator to be the same as that of the critical module, as shown in Figure (c); The base manipulator continues to move in the direction of the critical module, while the inner modules move to the corresponding inner directive module in the next module, as shown in Figure (d). Repeat the above until the end-effector reaches the target point, as shown in Figure (e, f).

Algorithm 1.

Crawling Process.

Evaluation of directive configurations

The crawling manipulator M will inevitably deviate from the directive configuration in the Crawling Process. Therefore, we introduce the maximum deviation distance D to evaluate the deviation.

For a specific directive configuration, the Crawling Process will correspondingly generate a planned motion. D is defined as the maximum distance between the inner modules and their corresponding inner directive modules at any moment in the planned motion.

Since each inner module will repeat the motion of the frontmost inner module, it is only necessary to analyze the motion of the foremost inner module to calculate the maximum deviation distance.

Sampling e points uniformly on the first inner module and calculating the distance from each point to the (N -i-1)-th and the (N-i)-th inner directive module, as shown in Figure 5. N is the number of inner directive modules, and i is the module inserted. The distance between the foremost inner module and its corresponding module is defined as the maximum of these distances

d (t, i, j) = d (s_{j} (q (t)), M_{N - i} (q_{d}))

where the $s_{j} (q (t))$ indicates the j-th sampling point in the configuration of the M at time t, the $M_{N - i} (q_{d})$ indicates the (N − i)-th inner directive module, and the d indicates the distance function which calculates the distance between the sampling points and the link of the inner directive module. Then the maximum deviation distance can be calculated as follows

D = max_{t,0 \leq i \leq N,1 \leq j \leq e} {d (t, i, j), d (t, i + 1, j)} .

The algorithm of the entire Crawling Control is summarized in Algorithm 2.

Figure 5.

The deviation in the Crawling Process. The solid blue polygonal line represents the crawling manipulator. The dashed orange polygonal line represents the directive configuration. The dashed red line represents the deviation distance from sampling points on the manipulator to the configuration.

Algorithm 2.

Crawling Control.

Swinging Search

In this section, details of the Swinging Search are elaborated. By applying RL to self-motion, the Swinging Search generates collision-free directive configurations, in which the end-effector has reached the goal pose. These generated directive configurations are used as the input of the Crawling Control which is discussed above. In this way, the search can make use of the environment information, so as not to explore blindly. Applying the self-motion also reduces the range of configurations that need to be searched.

The first subsection “State–action design” shows how the parameters are set to allow RL to help search, and the second subsection “Learning by DDPG” shows the specific RL algorithm used here and the full flow of Swinging Search.

State–action design

The design of state and action has a great influence on the convergence of RL, so the designed state and action should reflect the relationship between the manipulator and the environment.

To achieve self-motion to search for directive configurations, $\dot{q}$ derived from equation (10) is used to control the manipulator

\dot{q} = (I - J^{†} J) m

Intuitively, we design $M$ as the action $a_{t}$

a_{t} = m = {[\begin{matrix} m_{1} & m_{2} & \dots & m_{2 n} \end{matrix}]}^{T}

And the state $s_{t}$ is designed as

s_{t} = {[\begin{matrix} x & y & z & q_{1} & q_{2} & \dots & q_{2 n} \end{matrix}]}^{T}

where $q_{i}$ is the joint angle of the manipulator and $[\begin{matrix} x & y & z \end{matrix}]$ is the coordinate vector of the target position that the end-effector needs to reach.

A well-designed reward function can make the RL algorithm converge quickly. Therefore, a reward function consisting of penalties ${r^{'}}_{t}$ and ${r^{'}}^{'}_{t}$ is proposed. We design the number of modules that collided as a penalty ${r^{'}}_{t}$ so that the agent can obtain information about collisions. In this way, the agent tends to make the manipulator avoid obstacles when optimizing, so that it can find directive configurations more quickly

{r^{'}}_{t} = \sum_{i = 1}^{n} C_{i}

C_{i} = \{\begin{array}{l} - 1 & if the i -th module collided \\ 0 & else \end{array}

where reward function ${r^{'}}_{t}$ represents the number of modules that collided at time t and $C_{i}$ is the collision situation of the i-th module. If the i-th module collided with obstacles, the score of the i-th joint is −1, otherwise 0.

In addition, if a directive configuration can be found, the Swinging Search will also receive feedback from the Crawling Control, and the maximum deviation distance will also be used as a penalty ${r^{'}}^{'}_{t}$ , as follows

{r^{″}}_{t} = \{\begin{array}{l} - D / l & if a directive configuration is searched \\ 0 & else \end{array}

where D is the maximum deviation distance from the Crawling Control and l is the length of the manipulator link. Then the $r_{t}$ is as follows

r_{t} = \{\begin{array}{l} {r^{'}}_{t} + w \cdot {r^{'}}^{'}_{t} & if a directive configuration is searched \\ {r^{'}}_{t} & else \end{array}

where the scalar w is a scalar hyperparameter that is set according to the number of steps in an episode.

Learning by DDPG

The snake-like redundant manipulators used in this article have the characteristics of multiple DoFs and spatial continuity of action, so it is necessary to choose an algorithm that can solve the spatial problem of continuous action. Lillicrap et al. presented a deep deterministic policy gradient (DDPG)²³ algorithm which can operate over continuous action spaces, which is a good choice for this task. The DDPG algorithm uses the Actor–Critic framework, including the Actor network for selecting actions and the Critic network for evaluating the policy. The update of Actor network adopts the strategy gradient descent method, which is specifically expressed as

\nabla_{θ} J (θ) = \frac{1}{m^{'}} \sum_{i} \nabla_{a_{i}} Q (s_{i}, a_{i} | ω) \nabla_{θ} μ (s_{i} | θ)

where θ is the Actor network weight parameter, s is the state, a is the action, Q is the evaluation value, and $m^{'}$ is the number of samples of empirical data. The Critic network uses the mean square error loss function to update the parameters

Loss = \frac{1}{m^{'}} \sum_{i} {(r_{i} + γ Q^{'} (s_{i + 1}, a_{i + 1} | ω^{'}) - Q (s_{i}, a_{i} | ω))}^{2}

where ω is the Critic network weight parameter, γ is the reward discount factor.

The DDPG algorithm replicates the Actor network and the Critic network as the target network so that the agent can learn the task strategy stably, and its network weight parameters are expressed as $θ^{'}$ and $ω^{'}$ , respectively. The target network greatly enhances the stability of the learning process when the agent is training. The specific update method of the Actor target network is

θ^{'} \leftarrow τ θ + (1 - τ) θ^{'}

where τ is used to control the update speed of the Actor target network weight $θ^{'}$ .

Use the same method to update the Critic target network parameter $ω^{'}$

ω^{'} \leftarrow τ ω + (1 - τ) ω^{'}

In addition, the DDPG algorithm uses random noise to increase the exploration ability of the Actor network in the continuous action space to form a policy-map $μ^{'}$

μ^{'} (s_{t}) = μ (s_{t} | θ) + N^{'}

where $N^{'}$ is the random process of the noise.

Combined with the crawling algorithm, SSCC uses the feasible solution obtained by the RL algorithm to complete the exploring task. And each task can be used as empirical data to train the model. The RL model will be more convergent and the solution speed will be faster after many tasks.

The Swinging Search algorithm is summarized in Algorithm 3.

Algorithm 3.

Swinging Search.

Experiments and analysis

The performance of the SSCC is studied through some simulation experiments. This section presents the settings and the results of the experiments. The simulation is verified in two environments (the hollow pipe and the concentric pipe), and two manipulators are required to reach the target points which are in the pipe in each environment.

System description

The hardware equipment used in all experiments in this article is a MacBook Pro laptop (CPU M1, RAM 16 GB). All experiments are implemented in the software PyCharm CE, and the environment configuration was Python 3.9.7 and Pytorch 1.8.0. We use a robotics toolbox in python called robots-toolbox-python²⁴ to model and solve the kinematics of a hyper-redundant manipulator.

Testing case in the hollow pipe

In the first experiment, the hollow pipe is set as the obstacle. Parameters of the hollow pipe environment are shown in Table 2. And two snake-like redundant manipulators with different DoFs and equal total lengths are designed to verify the SSCC method. Parameters of manipulators are shown in Table 3.

Table 2.

The hollow pipe environment parameters.

Pipe radius (m)	Pipe length (m)	Distance between base and pipe (m)
0.2	1.4	1.0

Table 3.

The parameters of two snake-like redundant manipulators.

Parameter	Manipulator 1	Manipulator 2
DoFs	16	24
Length of links l (m)	0.3	0.2

DoF: degree of freedom.

We generated 30 target positions randomly in this experiment. And the efficiency of the SSCC is analyzed using directive configurations with different DoF manipulators, as shown in Table 4. The criterion for the success of the SSCC algorithm is defined as successfully searching for a set of collision-free configurations within 30s and no collision occurs during the crawling process for at least one directive configuration.

Table 4.

Experimental results in the hollow pipe environment.

	Manipulator 1	Manipulator 2
Target points	30	30
Success	25	29
Fail	5	1
Success rate (%)	83.3	96.7
Average time (s)	17.68	10.54

Experiments show that the success rate of a 24-DoF manipulator is higher than that of a 16-DoF manipulator. By thinking about it to the limit, this result can be explained intuitively. Assuming that the total length of the manipulator is constant, the more joints, the shorter the average length of each link. If there are infinite DoFs on the manipulator, then it can be seen as a soft rope. Increasing the number of DoFs over a fixed length makes the manipulator more flexible, allowing it to steer better to avoid obstacles and improve the success rate of the algorithm.

Due to the relatively single environment, the manipulator is less likely to collide during the Crawling Process. Most of the failed samples are caused by the search time exceeding the limit. In the early stage of training, the RL algorithm has not converged and may not be able to finish searching within a certain time.

For a more detailed elaboration, we randomly selected two samples from the two groups of experiments respectively as the explanation examples.

The target position is at $[\begin{matrix} 1.65 & - 0.11 & 0.04 \end{matrix}]$ . The two directive configurations of the two manipulators are searched by the Swinging Search algorithm, where the inner directive configuration consists of three modules and five modules, respectively. These configurations are shown in Figure 6, and the angles are not listed in detail due to the high redundancy of manipulators.

Figure 6.

The directive configurations (the solid orange polygonal line) for (a) 16 DoFs and (b) 24 DoFs manipulators in the hollow pipe. DoF: degree of freedom.

Then, the Crawling Control algorithm is verified in the simulation environment. The manipulators do not collide with the hollow pipe when manipulators enter the hollow pipe.

The joint angle curves of inner modules are shown in Figure 7, and the continuous curves without abrupt change indicate the stability of manipulator motion and verify the effectiveness. The deviation distances are shown in Figure 8. The deviation distances will be close to 0 when the inner manipulators overlap the corresponding inner directive modules. The maximum deviation distance then is calculated as the feedback to the DDPG model.

Figure 7.

The joint angle curves of the inner modules of manipulators in Figure 6. Since the two manipulators have too many DoFs, only the angle curves of joints entering the hollow pipe are shown in the figures. The joint angles curve of the (a) 16 DoFs manipulator and (b) 24 DoFs manipulator. DoF: degree of freedom.

Figure 8.

The deviation distances of two manipulators.

The movement processes of the snake-like redundant manipulators are shown in Figure 9.

Figure 9.

The Crawling Controls for (a) 16 DoFs and (b) 24 DoFs manipulators in the hollow pipe. DoF: degree of freedom.

Testing case in the concentric pipe

In the second experiment, the path planning using SSCC in the concentric pipe is studied. The parameters of the concentric pipe are shown in Table 5. The controlled 24 DoFs manipulator in this experiment is the same as the second manipulator used in the previous experiment.

Table 5.

The concentric pipe environment parameters.

Pipe	Pipe radius (m)	Pipe length (m)	Distance between base and pipe (m)
Outer pipe	0.2	1.4	1.0
Inner pipe	0.1	1.4	1.0

As the environment becomes more complex, the time limit of the algorithm in this experiment has been extended to 60s. The main reason for failure is the same as in the previous experiment. Most of the failed samples are caused by the search time exceeding the limit.

We generated 52 target positions randomly in the concentric pipe. The results are presented as shown in Table 6. The manipulator is requested to reach the target position between the two pipes without collision. The success rate is lower than that of the previous experiment in the pipe, but the SSCC can still complete most of the path planning tasks.

Table 6.

Experimental results in the concentric pipe environment.

Target points	Success	Fail	Success rate (%)	Average time (s)
52	50	2	96.1	52.61

We randomly select a target point $[\begin{matrix} 1.53 & 0.61 & - 0.11 \end{matrix}]$ to elaborate the SSCC method in this complex environment. The directive configuration is searched by the Swinging Search, and the inner directive configuration consists of four modules, as shown in Figure 10. The Crawling Process of the directive configuration is shown in Figure 11 in the complex environment. Similarly, the continuity of the crawling is verified by the joint angles curve shown in Figure 12. The deviation distance is shown in Figure 13.

Figure 10.

The directive configuration of the 24 DoFs manipulator in the concentric environment. DoF: degree of freedom.

Figure 11.

The Crawling Process of the 24 DoFs manipulator in the concentric pipe. The 24 DoFs manipulator starts at the critical module and crawls along inner directive modules like a snake until it reaches the target point. And in the process of movement, the manipulator did not collide with the inner and outer pipes. DoF: degree of freedom.

Figure 12.

The joint angles curve of the inner modules of the 24 DoFs manipulator in the concentric environment. DoF: degree of freedom.

Figure 13.

The maximum deviation distance change curve.

In summary, the high efficiency of the SSCC method is proved. In the pipe environment, changing the 16 DoFs manipulator to a 24-DoF manipulator with a shorter link length can improve search efficiency. In the concentric pipe environment, the hyper-redundant manipulator still has a high success rate for path planning.

Discussion

As the main advantage of the algorithm, we designed the Crawling Control inspired by snake crawling. It makes the Swinging Search only need planning the path of the end-effector instead of planning the body at the same time. What’s more, the purpose of the swinging searching algorithm using RL is not to plan the optimal solution but to speed up the searching of the collision-free configurations as the input of the Crawling Control.

The SSCC still has some limitations that we need to discuss in detail. In the Crawling Process, the inner modules will deviate from the directive configuration. And the deviation distance is cumulative. This means that the deviation distance may be greater if the more modules extend into the pipe. But this problem can be solved by using the maximum deviation distance as part of the reward function in the RL model. In addition, when facing a specific application in practice, it is enough for Swinging Search to plan a few feasible solutions to solve the problem before the RL model is fully trained. So that the planner can find solutions faster in future searches.

Another limitation of the SSCC algorithm is that it cannot plan the movement of the manipulator from one point in the pipe to another efficiently, since the algorithm just specifies the movement to get the manipulator into the pipe. For this former task, one possible solution is to plan the manipulator to move out of the pipe, by specifying the initial configuration as the directive configuration and then using the reverse Crawling Control. The reverse Crawling Control is the process that reverses the specified joint angles of the Crawling Control process in time series. The task now is transferred into one that the SSCC algorithm can solve. However, there is no guarantee that the manipulator will not collide when going out the pipe, because the directive configuration is now specified as the initial configuration of the movement, which cannot be re-specified if a collision occurs in the Crawling Control.

Conclusion

In this article, inspired by the morphology of snakes, we propose the SSCC algorithm for path planning. The snake-like redundant manipulator can be controlled by using SSCC for exploring complex environments. The snake-like redundant manipulator of the $n \times U$ open chains is designed to enhance adaptability. The snake’s crawling properties are imitated by the manipulator, which makes the manipulator move along the collision-free configuration in a special way. Furthermore, we utilize the maximum deviation distance to optimize the RL model, improving the success rate of planning collision-free configurations. Finally, a series of adequate experiments validate that the manipulators can explore complex environments with high success rates by using SSCC.

Footnotes

Acknowledgment

The authors thank Qi Liu for his help with this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by The Tianjin Science and Technology Program (19PTZWHZ00020).

ORCID iD

Ji Qu

References

Chirikjian

Burdick

. A hyper-redundant manipulator. IEEE Robot Autom Mag 1994; 1(4): 22–29.

Chirikjian

Burdick

. A modal approach to hyper-redundant manipulator kinematics. IEEE Trans Robot Autom 1994; 10(3): 343–354.

Mochiyama

. Shape control of manipulators with hyper degrees of freedoms. PhD Thesis, Japan Advanced Institute of Science and Technology, School of Information Science, Japan, 1998.

Yahya

Moghavvemi

Yang

, et al. Motion planning of hyper redundant manipulators based on a new geometrical method. In: 2009 IEEE international conference on industrial technology, Melbourne, Australia, 13–15 February 2019, pp. 1–5. New Jersey: IEEE.

Ratliff

Zucker

Bagnell

, et al. CHOMP: gradient optimization techniques for efficient motion planning. In: 2009 IEEE international conference on robotics and automation, Kobe, Japan, 12–17 May 2009, pp. 489–494. New Jersey: IEEE.

Zhao

Liu

. Motion planning of hyper-redundant manipulators based on ant colony optimization. In: 2016 IEEE international conference on robotics and biomimetics, Qingdao, China, 3–7 December 2016, pp. 1250–1255. New Jersey: IEEE.

Collins

Shen

. PASO: an integrated, scalable PSO-based optimization framework for hyper-redundant manipulator path planning and inverse kinematics. Information Sciences Institute Technical Report, 2016.

Wang

Zhu

Wang

, et al. An improved artificial potential field method of trajectory planning and obstacle avoidance for redundant manipulators. Int J Adv Robot Syst 2018; 15(5): 1729881418799562.

Liu

Zhang

Zhou

. Obstacle avoidance path planning of space manipulator based on improved artificial potential field method. J Inst Eng (India): Series C 2014; 95(1): 31–39.

10.

Kavraki

Svestka

Latombe

, et al. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans Robot Autom 1996; 12(4): 566–580.

11.

Ghajari

Mayorga

. Specialized PRM trajectory planning for hyper-redundant robot manipulators. WSEAS Trans Syst 2017; 16: 254–260.

12.

Hart

Nilsson

Raphael

. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybern 1968; 4(2): 100–107.

13.

Ryu

. A framework for unknown environment manipulator motion planning via model based real-time rehearsal. Berlin Heidelberg: Springer, 2011, pp. 30–35.

14.

Lavalle

Kuffner

. Rapidly-exploring random trees: progress and prospects. In: Algorithmic and computational robotics, Vol. 5, 1st ed. AK Peters/CRC Press, 2000, Wellesley, Mass, pp. 293–308.

15.

da Graça Marcos

Tenreiro Machado

Azevedo-Perdicoúlis

T-P

. Trajectory planning of redundant manipulators using genetic algorithms. Commun Nonlinear Sci Numer Simul 2009; 14(7): 2858–2869.

16.

Khoogar

Parker

. Obstacle avoidance of redundant manipulators using genetic algorithms. In: IEEE Proceedings of the SOUTHEASTCON ‘91, Williamsburg, USA, 7–10 April 1991, pp. 317–320. New Jersey: IEEE.

17.

Hua

Wang

, et al. Reinforcement learning-based collision-free path planner for redundant robot in narrow duct. J Intell Manuf 2021; 32: 471–482.

18.

Maaroof

. Self-motion control of kinematically redundant robot manipulators. Master Thesis, Izmir Institute of Technology, Turkey, 2012.

19.

Sciavicco

Siciliano

. A solution algorithm to the inverse kinematic problem for redundant manipulators. IEEE J Robot Autom 1988; 4(4): 403–410.

20.

Benzaoui

Chekireb

. Redundant robot manipulator control with obstacles avoidance using self-motion approach. In: Proceedings of the 13th IASTED international conference on robotics and applications, Wurzburg, Germany, 29–30 August 2007, pp. 21–26.

21.

Liegeois

. Automatic supervisory control of the configuration and behavior of multibody mechanisms. IEEE Trans Syst Man Cybern 1977; 7(12): 868–871.

22.

Lynch

Park

. Modern robotics. Cambridge: Cambridge University Press, 2017, p. 16.

23.

Lillicrap

Hunt

Pritzel

, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.

24.

Corke

Haviland

. Not your grandmother’s toolbox—the Robotics Toolbox reinvented for Python. In: 2021 IEEE international conference on robotics and automation (ICRA), Xi’an, China, 31 May–1 June 2021, pp. 11357–11363. New Jersey: IEEE.

A snake-inspired path planning algorithm based on reinforcement learning and self-motion for hyper-redundant manipulators

Abstract

Keywords

Introduction

Snake-like redundant manipulators

Mechanical structure design

Forward kinematics

Self-motion

Framework of SSCC

Crawling Control

Crawling Process

Evaluation of directive configurations

Swinging Search

State–action design

Learning by DDPG

Experiments and analysis

System description

Testing case in the hollow pipe

Testing case in the concentric pipe

Discussion

Conclusion

Footnotes

Acknowledgment

Declaration of conflicting interests

Funding

ORCID iD

References