Abstract
Redundant manipulators are flexible enough to adapt to complex environments, but their controller is also required to be specific for their extra degrees of freedom. Inspired by the morphology of snakes, we propose a path planning algorithm named Swinging Search and Crawling Control, which allows the snake-like redundant manipulators to explore in complex pipeline environments without collision. The proposed algorithm consists of the Swinging Search and the Crawling Control. In Swinging Search, a collision-free manipulator configuration that of the end-effector in the target point is found by applying reinforcement learning to self-motion, instead of designing joint motion. The self-motion narrows the search space to the null space, and the reinforcement learning makes the algorithm use the information of the environment, instead of blindly searching. Then in Crawling Control, the manipulator is controlled to crawl to the target point like a snake along the collision-free configuration. It only needs to search for a collision-free configuration for the manipulator, instead of searching collision-free configurations throughout the process of path planning. Simulation experiments show that the algorithm can complete path planning tasks of hyper-redundant manipulators in complex environments. The 16 degrees of freedom and 24 degrees of freedom manipulators can achieve 83.3% and 96.7% success rates in the pipe, respectively. In the concentric pipe, the 24 degrees of freedom manipulator has a success rate of 96.1%.
Keywords
Introduction
In recent decades, more and more researchers have been involved in the study of redundant manipulators to explore complex narrow environments, such as the natural gas pipes, sewer pipes, and tailpipes of airplanes. A manipulator is termed kinematically redundant when it possesses more degrees of freedom (DoFs) than it is needed to execute a given task. 1 The extra DoFs allow the manipulator to perform complex tasks, making path planning a challenge though.
To address the path planning of redundant manipulators, some early researchers proposed geometric methods. Chirikjian and Burdick 2 proposed the backbone curve approach for hyper-redundant robot kinematics. Later, a shape control geometric analysis method 3 was proposed by Mochiyama to control the whole shape of a manipulator. But such algorithms are computationally intensive. To reduce computational complexity, Yahya et al. 4 proposed geometric constraints that limit the angles between the adjacent links to be equal. However, the flexibility of the manipulators is also limited due to the introduced constraints.
Other researchers proposed the method based on optimization theory, in which obstacle avoidance can be achieved by optimizing the objective distance function between the manipulators and the obstacles. Zucker et al. 5 proposed the covariant Hamiltonian optimization for notion planning method which can quickly converge with generating a smooth collision-free trajectory. Zhao et al. 6 applied the ant colony algorithm to plan the collision-free optimal path of the end-effector. Collins and Shen 7 proposed an optimization framework path planning with swarm optimization to efficiently calculate the approximate solution of the collision-free path planning. Artificial potential field method is a classic algorithm for path planning of mobile robots, which can also be applied to the planning of manipulators. 8,9 In this algorithm, the obstacle produces repulsive virtual force and the target point generates attractive virtual force to the point, the resultant force of them push this point to achieve a collision-free path toward the target. These methods require calculating the distance from the manipulator to the obstacles and therefore also require modeling the surface of the obstacles, which is sometimes difficult to complete.
In addition, the random sampling methods are efficient path planning algorithms for redundant manipulators. The probabilistic roadmaps (PRM) method 10 proposed by Kavraki et al. randomly samples in the configuration space to search for collision-free configurations. 11 The sampling results are stored in a PRM. For any given start and goal configurations of the manipulator, the motion planning can be transformed into the problem of connecting the two corresponding nodes in the roadmap. 12 Compared with the optimization theory methods, the random sampling methods do not require figuring out the objective distance function, which reduces the computational complexity and improves the algorithm applicability. However, as the DoFs increase, methods of this kind are also computationally complex.
Facing the unknown environment in application, Um and Ryu 13 proposed RRT-Cabomba algorithm based on the sampling method RRT 14 which controls the IPA sensitive skin equipped robot to perceive and plan alternately. Further, they also adopted the bug algorithm in RRT-Cabomba to implement local minima avoidance.
With the development of artificial intelligence, applying genetic algorithm in the obstacle avoidance of redundant manipulators 15,16 is another method for solving the optimization problems in planning, without requiring to calculate the distance from the manipulator to the obstacles. As a search process, the genetic algorithm copies and mutates the parameters, and then filters them to find the parameters that make the fitness function better, so as to control the manipulator to avoid obstacles. Reinforcement learning (RL) has also been applied to the path planning of redundant manipulators. Hua et al. 17 applied the RL method to train the agent to plan the end-effector motion and self-motion 18 –20 separately. Combining the gradient projection method 21 decouples the motion of redundant manipulators into end-effector motion and self-motion. In this algorithm, the agent has to learn the entire planning process without making full use of the nature of self-motion, which still causes great computational difficulties.
In nature, snakes crawl along the path of their heads during crawling. Their bodies will not collide with the environment, as long as the heads can pass smoothly. Inspired by this biological phenomenon, this article proposes a path planning algorithm named Swinging Search and Crawling Control (SSCC) for snake-like redundant manipulators. This algorithm consists of the Swinging Search and the Crawling Control, allowing the redundant manipulators with repetitive modules to explore the complex pipeline environments.
By self-motion, the Swinging Search algorithm generates collision-free configurations as the directive configurations, in which the end-effector is in the target point. And then the manipulator can crawl into the pipe along the directive configurations by the Crawling Control. During crawling, each modular link repeats the motion of the foremost module.
The major contributions of this article are as follows. Inspired by the biological characteristics of snakes, this article innovatively proposes the SSCC algorithm for snake-like redundant manipulators to be adapted to narrow environments such as pipes. The SSCC algorithm has low computational complexity, since only one directive configuration needs to be searched. The SSCC algorithm has good expansibility and adaptability for modular snake-like redundant manipulators with n universal joints.
Snake-like redundant manipulators
In this section, each subsection presents the mechanical structure description, the forward kinematics, and the self-motion of the snake-like redundant manipulators in turn. The formulation presented here is to elaborate on what the manipulators look like and how to control them.
Mechanical structure design
In this article, to be analogous to snakes in morphology, the modular redundant manipulators with n universal joints are designed. The manipulators are

The structure of the general snake-like redundant manipulator with 12 joints.
The D-H parameters of the snake-like redundant manipulators.
D-H: Denavit–Hartenberg.
By the D-H parameters, the homogeneous transformation matrices for kinematics are expressed as
Forward kinematics
In forward kinematics, the pose of the end-effector
where
Self-motion
The i-th column of the Jacobian matrix
where
The pseudo inverse of Jacobian matrix is defined as
With the Jacobian matrix, the end-effector velocity vector
where
For control purposes, it is common to solve
where the term
All the homogeneous solutions form the null space of
where k is a scale factor,
By specifying
where
Framework of SSCC
In this article, a path planning algorithm named SSCC is proposed. Through this algorithm, the modular redundant manipulators can reach into the complex pipe environment without collision, allowing the end-effector to achieve the goal pose.
SSCC algorithm consists of two units, the Swinging Search and the Crawling Control. The Swinging Search generates collision-free configurations by applying RL to self-motion. The generated configurations are named directive configurations, in which the end-effector has reached the goal pose. In the Crawling Control, the controlled redundant manipulator crawls into the pipe along the directive configuration like a snake. The flowchart of SSCC is shown in Figure 2.

The framework of Swinging Search and Crawling Control.
In Figure 2,
Since the length of the rigid link in the module cannot be ignored, the manipulator will inevitably deviate from the directive configuration during the Crawling Process, which may also cause the manipulator to collide. The Crawling Process is deterministic, which means that the degree of deviation can be calculated based on the directive configuration. Therefore, we propose the maximum deviation distance to evaluate the directive configuration to train the agent in RL, so that the Swinging Search can generate the directive configuration with a smaller maximum deviation distance to reduce the possibility of collision during crawling.
Both the Swinging Search and the Crawling Control include collision detection. In the Swinging Search, collision detection is used to determine whether to terminate the search. In the Crawling Control, if a collision is detected, then the directive configuration is invalid, and the Swinging Search should regenerate a new directive configuration.
Crawling Control
The Crawling Process refers to the process in which the manipulator
The flow of the Crawling Process is shown in the subsection “Crawling Process,” which specifies how the manipulator is controlled given a certain directive configuration.
Then in the subsection “Evaluation of directive configurations,” the evaluation method is proposed, in which the maximum deviation distance is designed to measure the deviation of the controlled manipulator from the directive configuration. For a specific configuration, the Crawling Process produces a unique control process. So the maximum deviation distance is the evaluation indicator of directive configurations. The smaller the maximum deviation distance, the better the directive configuration. This property can be used in the reward function of RL in Swinging Search, to help generate better directive configurations, so as to reduce the possibility of collisions in the Crawling Process.
Crawling Process
To illustrate the Crawling Process, several definitions are made as follows, and structures are shown in Figure 3. The modules of The inner directive module at the entrance of the pipe is called the critical module. The two ends of the critical module are denoted as Similarly, the modules of the crawling manipulator The remaining modules of

The structures of (a) the directive configuration and (b) the crawling manipulator. The solid blue polygonal line represents the crawling manipulator. The dashed orange polygonal line represents the directive configuration.
In the Crawling Process, the base modules and the inner modules of
As the modules move forward, the base modules in the front will become the inner modules, and the control over them will be changed, too.
For the next round of insertion, the new

The base manipulator moves along the direction of the critical module from point p1 until it reaches p2, as shown in Figure (b). Then adjust the direction of the end-effector of the base manipulator to be the same as that of the critical module, as shown in Figure (c); The base manipulator continues to move in the direction of the critical module, while the inner modules move to the corresponding inner directive module in the next module, as shown in Figure (d). Repeat the above until the end-effector reaches the target point, as shown in Figure (e, f).
Crawling Process.
Evaluation of directive configurations
The crawling manipulator
For a specific directive configuration, the Crawling Process will correspondingly generate a planned motion. D is defined as the maximum distance between the inner modules and their corresponding inner directive modules at any moment in the planned motion.
Since each inner module will repeat the motion of the frontmost inner module, it is only necessary to analyze the motion of the foremost inner module to calculate the maximum deviation distance.
Sampling e points uniformly on the first inner module and calculating the distance from each point to the (N -i-1)-th and the (N-i)-th inner directive module, as shown in Figure 5. N is the number of inner directive modules, and i is the module inserted. The distance between the foremost inner module and its corresponding module is defined as the maximum of these distances
where the
The algorithm of the entire Crawling Control is summarized in Algorithm 2.

The deviation in the Crawling Process. The solid blue polygonal line represents the crawling manipulator. The dashed orange polygonal line represents the directive configuration. The dashed red line represents the deviation distance from sampling points on the manipulator to the configuration.
Crawling Control.
Swinging Search
In this section, details of the Swinging Search are elaborated. By applying RL to self-motion, the Swinging Search generates collision-free directive configurations, in which the end-effector has reached the goal pose. These generated directive configurations are used as the input of the Crawling Control which is discussed above. In this way, the search can make use of the environment information, so as not to explore blindly. Applying the self-motion also reduces the range of configurations that need to be searched.
The first subsection “State–action design” shows how the parameters are set to allow RL to help search, and the second subsection “Learning by DDPG” shows the specific RL algorithm used here and the full flow of Swinging Search.
State–action design
The design of state and action has a great influence on the convergence of RL, so the designed state and action should reflect the relationship between the manipulator and the environment.
To achieve self-motion to search for directive configurations,
Intuitively, we design
And the state
where
A well-designed reward function can make the RL algorithm converge quickly. Therefore, a reward function consisting of penalties
where reward function
In addition, if a directive configuration can be found, the Swinging Search will also receive feedback from the Crawling Control, and the maximum deviation distance will also be used as a penalty
where D is the maximum deviation distance from the Crawling Control and l is the length of the manipulator link. Then the
where the scalar w is a scalar hyperparameter that is set according to the number of steps in an episode.
Learning by DDPG
The snake-like redundant manipulators used in this article have the characteristics of multiple DoFs and spatial continuity of action, so it is necessary to choose an algorithm that can solve the spatial problem of continuous action. Lillicrap et al. presented a deep deterministic policy gradient (DDPG) 23 algorithm which can operate over continuous action spaces, which is a good choice for this task. The DDPG algorithm uses the Actor–Critic framework, including the Actor network for selecting actions and the Critic network for evaluating the policy. The update of Actor network adopts the strategy gradient descent method, which is specifically expressed as
where θ is the Actor network weight parameter, s is the state, a is the action, Q is the evaluation value, and
where ω is the Critic network weight parameter, γ is the reward discount factor.
The DDPG algorithm replicates the Actor network and the Critic network as the target network so that the agent can learn the task strategy stably, and its network weight parameters are expressed as
where τ is used to control the update speed of the Actor target network weight
Use the same method to update the Critic target network parameter
In addition, the DDPG algorithm uses random noise to increase the exploration ability of the Actor network in the continuous action space to form a policy-map
where
Combined with the crawling algorithm, SSCC uses the feasible solution obtained by the RL algorithm to complete the exploring task. And each task can be used as empirical data to train the model. The RL model will be more convergent and the solution speed will be faster after many tasks.
The Swinging Search algorithm is summarized in Algorithm 3.
Swinging Search.
Experiments and analysis
The performance of the SSCC is studied through some simulation experiments. This section presents the settings and the results of the experiments. The simulation is verified in two environments (the hollow pipe and the concentric pipe), and two manipulators are required to reach the target points which are in the pipe in each environment.
System description
The hardware equipment used in all experiments in this article is a MacBook Pro laptop (CPU M1, RAM 16 GB). All experiments are implemented in the software PyCharm CE, and the environment configuration was Python 3.9.7 and Pytorch 1.8.0. We use a robotics toolbox in python called robots-toolbox-python 24 to model and solve the kinematics of a hyper-redundant manipulator.
Testing case in the hollow pipe
In the first experiment, the hollow pipe is set as the obstacle. Parameters of the hollow pipe environment are shown in Table 2. And two snake-like redundant manipulators with different DoFs and equal total lengths are designed to verify the SSCC method. Parameters of manipulators are shown in Table 3.
The hollow pipe environment parameters.
The parameters of two snake-like redundant manipulators.
DoF: degree of freedom.
We generated 30 target positions randomly in this experiment. And the efficiency of the SSCC is analyzed using directive configurations with different DoF manipulators, as shown in Table 4. The criterion for the success of the SSCC algorithm is defined as successfully searching for a set of collision-free configurations within 30s and no collision occurs during the crawling process for at least one directive configuration.
Experimental results in the hollow pipe environment.
Experiments show that the success rate of a 24-DoF manipulator is higher than that of a 16-DoF manipulator. By thinking about it to the limit, this result can be explained intuitively. Assuming that the total length of the manipulator is constant, the more joints, the shorter the average length of each link. If there are infinite DoFs on the manipulator, then it can be seen as a soft rope. Increasing the number of DoFs over a fixed length makes the manipulator more flexible, allowing it to steer better to avoid obstacles and improve the success rate of the algorithm.
Due to the relatively single environment, the manipulator is less likely to collide during the Crawling Process. Most of the failed samples are caused by the search time exceeding the limit. In the early stage of training, the RL algorithm has not converged and may not be able to finish searching within a certain time.
For a more detailed elaboration, we randomly selected two samples from the two groups of experiments respectively as the explanation examples.
The target position is at

The directive configurations (the solid orange polygonal line) for (a) 16 DoFs and (b) 24 DoFs manipulators in the hollow pipe. DoF: degree of freedom.
Then, the Crawling Control algorithm is verified in the simulation environment. The manipulators do not collide with the hollow pipe when manipulators enter the hollow pipe.
The joint angle curves of inner modules are shown in Figure 7, and the continuous curves without abrupt change indicate the stability of manipulator motion and verify the effectiveness. The deviation distances are shown in Figure 8. The deviation distances will be close to 0 when the inner manipulators overlap the corresponding inner directive modules. The maximum deviation distance then is calculated as the feedback to the DDPG model.

The joint angle curves of the inner modules of manipulators in Figure 6. Since the two manipulators have too many DoFs, only the angle curves of joints entering the hollow pipe are shown in the figures. The joint angles curve of the (a) 16 DoFs manipulator and (b) 24 DoFs manipulator. DoF: degree of freedom.

The deviation distances of two manipulators.
The movement processes of the snake-like redundant manipulators are shown in Figure 9.

The Crawling Controls for (a) 16 DoFs and (b) 24 DoFs manipulators in the hollow pipe. DoF: degree of freedom.
Testing case in the concentric pipe
In the second experiment, the path planning using SSCC in the concentric pipe is studied. The parameters of the concentric pipe are shown in Table 5. The controlled 24 DoFs manipulator in this experiment is the same as the second manipulator used in the previous experiment.
The concentric pipe environment parameters.
As the environment becomes more complex, the time limit of the algorithm in this experiment has been extended to 60s. The main reason for failure is the same as in the previous experiment. Most of the failed samples are caused by the search time exceeding the limit.
We generated 52 target positions randomly in the concentric pipe. The results are presented as shown in Table 6. The manipulator is requested to reach the target position between the two pipes without collision. The success rate is lower than that of the previous experiment in the pipe, but the SSCC can still complete most of the path planning tasks.
Experimental results in the concentric pipe environment.
We randomly select a target point

The directive configuration of the 24 DoFs manipulator in the concentric environment. DoF: degree of freedom.

The Crawling Process of the 24 DoFs manipulator in the concentric pipe. The 24 DoFs manipulator starts at the critical module and crawls along inner directive modules like a snake until it reaches the target point. And in the process of movement, the manipulator did not collide with the inner and outer pipes. DoF: degree of freedom.

The joint angles curve of the inner modules of the 24 DoFs manipulator in the concentric environment. DoF: degree of freedom.

The maximum deviation distance change curve.
In summary, the high efficiency of the SSCC method is proved. In the pipe environment, changing the 16 DoFs manipulator to a 24-DoF manipulator with a shorter link length can improve search efficiency. In the concentric pipe environment, the hyper-redundant manipulator still has a high success rate for path planning.
Discussion
As the main advantage of the algorithm, we designed the Crawling Control inspired by snake crawling. It makes the Swinging Search only need planning the path of the end-effector instead of planning the body at the same time. What’s more, the purpose of the swinging searching algorithm using RL is not to plan the optimal solution but to speed up the searching of the collision-free configurations as the input of the Crawling Control.
The SSCC still has some limitations that we need to discuss in detail. In the Crawling Process, the inner modules will deviate from the directive configuration. And the deviation distance is cumulative. This means that the deviation distance may be greater if the more modules extend into the pipe. But this problem can be solved by using the maximum deviation distance as part of the reward function in the RL model. In addition, when facing a specific application in practice, it is enough for Swinging Search to plan a few feasible solutions to solve the problem before the RL model is fully trained. So that the planner can find solutions faster in future searches.
Another limitation of the SSCC algorithm is that it cannot plan the movement of the manipulator from one point in the pipe to another efficiently, since the algorithm just specifies the movement to get the manipulator into the pipe. For this former task, one possible solution is to plan the manipulator to move out of the pipe, by specifying the initial configuration as the directive configuration and then using the reverse Crawling Control. The reverse Crawling Control is the process that reverses the specified joint angles of the Crawling Control process in time series. The task now is transferred into one that the SSCC algorithm can solve. However, there is no guarantee that the manipulator will not collide when going out the pipe, because the directive configuration is now specified as the initial configuration of the movement, which cannot be re-specified if a collision occurs in the Crawling Control.
Conclusion
In this article, inspired by the morphology of snakes, we propose the SSCC algorithm for path planning. The snake-like redundant manipulator can be controlled by using SSCC for exploring complex environments. The snake-like redundant manipulator of the
Footnotes
Acknowledgment
The authors thank Qi Liu for his help with this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by The Tianjin Science and Technology Program (19PTZWHZ00020).
