Cognition-based hybrid path planning for autonomous underwater vehicle target following

Abstract

Intelligent path planning is one of the key techniques for autonomous underwater vehicles for the purpose of target detection, environmental survey and so on. In order to realize automatic motion plan, an intelligent cognitive architecture for autonomous underwater vehicle motion planning has been proposed to realize complicated target detection and mobile target following in the disturbance environment. A novel adaptive ant colony optimization and particle swarm optimization fusion-based fuzzy rules optimization algorithm has been proposed to generate optimized fuzzy rules. Through this optimization algorithm, the preliminary fuzzy rules can be optimized to realize intelligent motion planning for complicated operation tasks. Experiments of channel following for wall detection and mobile target following in the oceanic environment have verified the validity of path planning method in the implementation of detection and operation tasks.

Keywords

Autonomous underwater vehicle intelligent motion planning fuzzy rules optimization particle swarm optimization ant colony optimization

Introduction

Currently, autonomous underwater vehicles (AUVs) are increasingly attractive for various underwater tasks such as environment exploration,¹ seabed survey,² harbour protection³ and submarine search and rescue.⁴ With the development of artificial intelligence and computer science, autonomous path planning can not only help the vehicle initialize the operation route but also realize reasonable reaction on different states of obstacles and target during operation. Path planning methods include global path planning and local path planning. Global path is generated on the basis of preliminary environmental understanding and modelling.⁵ On the other hand, local path planning is responsible for the generation of local or regional path to handle unknown and temporary obstacles under environmental disturbances.⁶

In the last decade, a variety of solutions have been developed for the underwater vehicle path planning problems.⁷ The fast marching global plan algorithm includes graph-based algorithm,⁸ heuristic search algorithm,⁹ evolutionary optimization algorithm¹⁰ and so on. Zhuang et al.¹¹ proposed a hybrid optimization algorithm to integrate particle swarm optimization (PSO) algorithm with Legendre pseudo spectral method (LPM) for AUV operating in cluttered and uncertain environments, and the searching process is accelerated through LPM. Cheng et al.² proposed a dynamic programming-based genetic path planner algorithm in which the random-based crossover operator is replaced with a deterministic crossover operator.

Local path planner algorithm includes artificial potential field (APF),¹² fuzzy path planner¹³ and so on. Compared with surface vehicles,^14,15 the local path planning of AUV is more eligible to be disturbed by the environmental disturbance. Potential field algorithms are mostly applied because they can generate the forward path by constructing an APF to weigh the influences of obstacles and goal points with better consistency and convergence for AUV local path planning.¹⁶ But local minima may stop the objective from being achieved. In order to overcome this problem, Melingui et al.¹⁷ proposed a novel plan and navigation approach to integrate APF and fuzzy logic into a common framework, which utilizes both the heuristic knowledge and the sampled input–output data pairs. Park et al.¹² presented an advanced fuzzy potential field method for mobile robot obstacle avoidance. The method primarily generates the repulsive forces of surrounding obstacles, and secondly handles linguistic variables with fuzzy rules. However, unknown and complicated coastal environment often causes difficulties in the formulation of fuzzy rules.¹⁸

However, AUV path planning in applications often confronts with complicated difficulties. Global path planning is proposed to fulfil missions like regional coverage, target search and tracking^19,20 in limited time. Local path planning is proposed to evade obstacles, make close observation and even operation on specified target, which may sometimes meet unpredicted difficulties and require infinite efforts.²¹ The AUV should adjust its route and coordinate with global and local task within the consideration of uncertain disturbances, operation time, local sampling cost and so on.

In comparison with the reflection and reasoning of human beings, the intelligence of AUVs is still in their early stages. Invariable rules-based fuzzy reasoning cannot adapt for various and complicated environment.²² The cognition of human brain, on the contrary, operates on the basis of working memory, generates different fuzzy rules in the face of different conditions,²³ through the integration of perceiving, understanding, reasoning and learning.²⁴ This article will propose a novel intelligent AUV path planning algorithm on the basis of cognitive architecture for unknown obstacle avoidance, target following and detection. The contributions of this study are described as follows:

This study has proposed an intelligent fuzzy path planning method on the basis of cognitive architecture for AUV target following to avoid unknown obstacles in the disturbance environment.

A novel adaptive ant colony optimization (ACO) and PSO algorithm-based fuzzy rules optimization algorithm has been proposed to generate optimized fuzzy rules to obtain optimal path for proposed operation tasks.

Canal wall following and oceanic mobile target following experiments have verified the validity of path planning method in the implementation of detection and operation tasks.

The rest of this study is organized as follows. The cognitive architecture for AUV path planning is issued in the second section. A novel intelligent path planning method is proposed in the third section with the adaptive ACO-PSO algorithm for fuzzy rule optimization. Tank and typical environmental experiments will be discussed and analysed in the fourth section. We will make conclusion in the last section.

Cognitive architecture for AUV path planning

The proposed cognitive architecture has been designed on the basis of the torpedo shape and intelligent AUV in Figure 1. It is an AUV with the length at 5.5 m, the greatest gyrator diameter at 0.63 m, greatest submergence depth at 2000 m, greatest cruising speed for 5 kn and greatest continuation journey at 350 km. The AUV is equipped with a ‘Tritech Super SeaKing DST’ digital scanning sonar, a 740 TV line camera, two vertical channel thrusters, one propeller, vertical rudders and horizontal wings and so on. The exploration range of the digital scanning sonar is 300 m with its vertical beam width at 20° and horizontal beam width at 3°. Its navigation and position reckoning can be realized through Doppler Velocity Log (DVL) and inertial navigation system. The operation system is on the basis of PC104 embedded with C++ as the software language.

Figure 1.

Torpedo shape and intelligent AUV construction. (1) Digital scanning sonar. (2) Vertical channel thruster. (3) GPS and wireless antenna. (4) Propeller. (5) Vertical rudders and horizontal wings. (6) Multi-beam Bathymeter. (7) Sidescan sonar. (8) Underwater CCD. AUV: autonomous underwater vehicle.

The proposed cognitive architecture of the AUV includes a high level of knowledge-based path planner, a low level of vehicle control and executive module and environmental perception module (see Figure 2). The high-level autonomous planner and low-level control module interact each other simultaneously through environmental perception and state awareness. The executive module fulfils the AUV control command with current situation feedbacks.

Figure 2.

Cognitive architecture for AUV path planning. AUV: autonomous underwater vehicle.

The high level of knowledge-based path planner includes the global path planner and the local path planner. The global waypoints are initialized in advance according to the marine map and mission tasks before the mission start. The global path planner is realized through a revised heuristic search and coverage algorithm.²⁵ Through the global path planner, the AUV tasks and path waypoints are arranged and organized.

In order to realize local path planning for target detection and obstacle avoidance in the disturbance environment, the oceanic environment is modelled not only with the digital scanning sonar of mid and far distant obstacles but also with a camera carefully observing the ambient environment and operation target. Fuzzy rules are initially proposed and optimized through adaptive ACO and PSO rule optimization algorithm.

Intelligent path planning method

Obstacles and target modelling

The digital scanning sonar is the device that can not only detect the target and the obstacle but also measure their distances from the vehicle. The range of scanning sonar is 20° with the radius as R. The representation of sonar detection space is important for environmental modelling and target tracking. In the sonar detection space, an occupancy grid map of environment has been constructed. Each cell of the map in the discrete region is characterized through two states: empty and occupied. The state is obtained through the projection of sonar profile. From the data flow of projection profile, one can obtain whether the cell is occupied or empty. The sonar can be applied to model and find uncertain static and dynamic object with Gaussian distribution. The operation profile of scanning sonar is shown in Figure 3.

Figure 3.

The operation range of the digital scanning sonar (the space of sonar includes seven radial directions, that is, left (l), front small left (Sfl), front left (fl), front (f), front right (fr), front large right (fr), right).

Through Demster–Shafer theory, the goal of the occupancy grids is to determine the cell possibilities of empty and occupied. If we define O as the states of occupation by obstacles, and E as the empty states, one can obtain the set of discernment Θ as

Θ ={O, E}

Each grid in the workspace is defined by the cell state $U (i, j)$ , which is used to describe the assignment of basic probability $m_{i, j}$ to each label in Ω

Ω = {ϕ, E, O, {E, O}} = 2^{Θ}

Each cell $U (i, j)$ in the grid is defined as

m_{i, j} (ϕ) + m_{i, j} (E) + m_{i, j} (O) + m_{i, j} ({E, O}) = 1

with every cell in this map is initialized as

m_{i, j} (O) = m_{i, j} (E) = 0 and m_{i, j} ({E, O}) = 1

In the path planning process of potential field, the attractive forces come from the goal and the observing targets, while the repulsive forces come from the obstacles. On the basis of sonar profile, the repulsive force between the vehicle and the obstacles can be obtained to prevent collisions through the sum of gradients of the potential field

f_{rep} = {\begin{matrix} k_{rep} \sum_{i = 1}^{m} (\frac{1}{‖ P_{a} - P_{o i} ‖} - \frac{1}{d_{max}}) \frac{P_{a} - P_{o i}}{‖ P_{a} - P_{o i} ‖} \begin{matrix} ​ & if ‖ P_{a} - P_{o i} ‖ < d_{max} \end{matrix} \\ 0 \begin{matrix} ​ & ​ & ​ & otherwise \end{matrix} \end{matrix}

where m is the number of ambient obstacles, $k_{rep}$ is the positive constant of the repulsive forces and $d_{max}$ is the obstacle influence distance. From equation (5), the repulsive forces are the sum of several subrepulsive ones. From the obstacles modelling through scanning sonar, three components of obstacle information are expressed with obstacle position ( $P_{o i} = (x_{o i}, y_{o i}, z_{o i})$ ). If the current position of AUV is set as $P_{a} = (x_{a}, y_{a}, z_{a})$ , the distance and angular relationship between the AUV and the obstacle can be described as

{\begin{matrix} | d_{A - O b} | = \sqrt{{(x - x_{o p})}^{2} + {(y - y_{o p})}^{2} + {(z - z_{o p})}^{2}} \\ ψ_{A - O b} = a tan \frac{y - y_{o p}}{x - x_{o p}} \\ θ_{A - O b} = a tan \frac{z - z_{o p}}{\sqrt{{(x - x_{o p})}^{2} + {(y - y_{o p})}^{2}}} \end{matrix}

In the path planning process, the obstacles must be avoided once they are found, while the targets should be observed and detected with the existence possibilities. According to the theory of potential field, the attractive forces come from the goal and the observing targets. On the other hand, the position of observed target can be described with $P_{t} = (x_{t}, y_{t}, z_{t})$ , target radius rate ( $Θ_{t r}$ ) and target uncertain rate ( $Θ_{U r}$ ). The target uncertainty is modelled with uncertain radius which is varied with normal distribution $Θ_{t r} \sim (P_{t}, σ)$ . If M ₁, M ₂ and M ₃ are used to define the uncertainty difference on the object motion, the relationship between target change rate from t − 1 to t can be described through normal distribution and probability density functions of radius rate as follows

Θ_{t r} (t) = M_{1} Θ_{t r} (t - 1) + M_{2} X_{(t - 1)} + M_{3}

where $X_{(t - 1)} = N (0, σ)$ is the normal distribution for the target, $M_{2} = {[\begin{matrix} 0 & 1 & 1 \end{matrix}]}^{T}$ , $M_{3} = {[\begin{matrix} 0 & 0 & Θ_{U r} \end{matrix}]}^{T}$ and

M_{1} = [\begin{matrix} 1 & Θ_{U r} (t) & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}]

In the local planning, a novel multilayered potential field approach has been proposed for the target detection. The attractive force includes the goal and the observed target

f_{att} = {\begin{matrix} \begin{array}{l} k_{att} (λ (sgn (P_{g} - P_{a}) {‖ P_{g} - P_{a} ‖}^{2}) + (1 - λ) Θ_{U r} (t) (sgn (P_{t} - P_{a}) {‖ P_{t} - P_{a} ‖}^{2})) \\ if ‖ P_{g} - P_{a} ‖ > γ_{g} and ‖ P_{t} - P_{a} ‖ > γ_{t} \end{array} \\ \begin{array}{l} k_{att} (λ (sgn (P_{g} - P_{a}) ‖ P_{g} - P_{a} ‖) + (1 - λ) Θ_{U r} (t) (sgn (P_{t} - P_{a}) {‖ P_{t} - P_{a} ‖}^{2})) \\ if ‖ P_{g} - P_{a} ‖ \leq γ_{g} and ‖ P_{t} - P_{a} ‖ > γ_{t} \end{array} \\ \begin{array}{l} k_{att} (λ (sgn (P_{g} - P_{a}) ‖ P_{g} - P_{a} ‖) + (1 - λ) Θ_{U r} (t) (sgn (P_{t} - P_{a}) ‖ P_{t} - P_{a} ‖)) \\ if ‖ P_{g} - P_{a} ‖ \leq γ_{g} and ‖ P_{t} - P_{a} ‖ \leq γ_{t} \end{array} \\ \begin{array}{l} k_{att} (λ (sgn (P_{g} - P_{a}) {‖ P_{g} - P_{a} ‖}^{2}) + (1 - λ) Θ_{U r} (t) (sgn (P_{t} - P_{a}) ‖ P_{t} - P_{a} ‖)) \\ if ‖ P_{g} - P_{a} ‖ > γ_{g} and ‖ P_{t} - P_{a} ‖ \leq γ_{t 2} \end{array} \end{matrix}

where $k_{att}$ is the positive constant of the attractive forces, $P_{g} = (x_{g}, y_{g}, z_{g})$ represents the goal position, $P_{t} = (x_{t}, y_{t}, z_{t})$ represents the target position, λ represents the coefficient of the attractive forces between the target and the goal and $γ_{g}$ and $γ_{t}$ represent the distance limit for the layer of attractive force.

Therefore, the artificial force can be obtained as

f = f_{rep} + f_{att}

From equations (5) to (10), the local obstacle and the target can be remodelled and the path can be planned through Figure 3.

Fuzzy logic descriptions

For the AUV cruising in the complicated environment, it should not only detect the target and avoid obstacles but also overcome oceanic current and uncertain disturbance. The fuzzy logic path planning method includes the following steps:

1. Fuzzification: all the fuzzy logic inputs, including AUV current positions $P_{a} = (x_{a}, y_{a}, z_{a})$ , detected the ith obstacle positions $P_{o i} = (x_{o i}, y_{o i}, z_{o i})$ , target positions $P_{t} = (x_{t}, y_{t}, z_{t})$ , AUV current velocity, heading angle, relative position between the AUV and the target and relative position between AUV and obstacles, are all interpreted into fuzzy linguistic values. And the fuzzy membership functions vary with the fuzzy inputs. The fuzzy outputs are the velocity and expect angle of the AUV.

2. Rule generation and optimization through adaptive ACO and PSO algorithm.

The initial fuzzy rules are the combinations of these corresponding forces and reactions. The number of fuzzy rules n_r should be correspondent with the sum and combinations of environmental conditions such as obstacles, targets, current and disturbances. The number of the fuzzy rules is related to the number of the input variables and the logical relationships between them. Although the former is easy to be get, the latter is hard to be established. In this article, the interplay between those AUV state variables and the outside information such as velocity, angle and position will aggravate the difficulty of the establishment of the fuzzy rules. The sum and combinations of the environmental conditions refer to the internal relationship of the fuzzy inputs.

The n_r fuzzy rules are designed by using Takagi–Sugeno fuzzy model of ‘IF-THEN’ rules as follows.

The ith rule R_p :

IF $‖ P_{a} - P_{o i} ‖ = d_{p i}$ , $‖ P_{g} - P_{a} ‖ = d_{t i}$ , the AUV speed is $v_{a}$ and the current speed is $v_{c}$ , THEN

y_{p} = f_{rep}^{p} + f_{att}^{p}

where $f_{rep}^{p} = {\begin{matrix} k_{rep} \sum_{i = 1}^{m} (\frac{ξ ‖ v_{a} - v_{c} ‖}{‖ P_{a} - P_{o i} ‖} - \frac{1}{d_{max}}) \frac{P_{a} - P_{o i}}{‖ P_{a} - P_{o i} ‖} \begin{matrix} & i f ‖ P_{a} - P_{o i} ‖ < d_{max} \end{matrix} \\ 0 \begin{matrix} & & & otherwise \end{matrix} \end{matrix}$

\begin{array}{l} f_{att} = {\begin{matrix} \begin{array}{l} k_{att} ‖ v_{a} - v_{c} ‖ (λ (sgn (P_{g} - P_{a}) {‖ P_{g} - P_{a} ‖}^{2}) + (1 - λ) Θ_{U r} (t) (sgn (P_{t} - P_{a}) {‖ P_{t} - P_{a} ‖}^{2})) \\ if ‖ P_{g} - P_{a} ‖ > γ_{g} and ‖ P_{t} - P_{a} ‖ > γ_{t} \end{array} \\ \begin{array}{l} k_{att} ‖ v_{a} - v_{c} ‖ (λ (sgn (P_{g} - P_{a}) ‖ P_{g} - P_{a} ‖) + (1 - λ) Θ_{U r} (t) (sgn (P_{t} - P_{a}) {‖ P_{t} - P_{a} ‖}^{2})) \\ if ‖ P_{g} - P_{a} ‖ \leq γ_{g} and ‖ P_{t} - P_{a} ‖ > γ_{t} \end{array} \\ \begin{array}{l} k_{att} ‖ v_{a} - v_{c} ‖ (λ (sgn (P_{g} - P_{a}) ‖ P_{g} - P_{a} ‖) + (1 - λ) Θ_{U r} (t) (sgn (P_{t} - P_{a}) ‖ P_{t} - P_{a} ‖)) \\ if ‖ P_{g} - P_{a} ‖ \leq γ_{g} and ‖ P_{t} - P_{a} ‖ \leq γ_{t} \end{array} \\ \begin{array}{l} k_{att} ‖ v_{a} - v_{c} ‖ (λ (sgn (P_{g} - P_{a}) {‖ P_{g} - P_{a} ‖}^{2}) + (1 - λ) Θ_{U r} (t) (sgn (P_{t} - P_{a}) ‖ P_{t} - P_{a} ‖)) \\ if ‖ P_{g} - P_{a} ‖ > γ_{g} and ‖ P_{t} - P_{a} ‖ \leq γ_{t 2} \end{array} \end{matrix} \end{array}

$d_{max}$ is the influence distance of the obstacles and ξ is the speed coefficient, and p = 1, 2,…n_p , represent the number of rules.

3. Inference mechanism: in this process, the reasoning process includes optimized fuzzy inference rules.

4. Defuzzification: in this process, the defuzzification equation is the weighted average of the fuzzy outputs

y = \frac{\sum_{p = 1}^{n_{p}} w_{p} y_{p}}{\sum_{p = 1}^{n_{p}} w_{p}}

where $w_{p}$ is the weight value.

Adaptive rules generation and optimization algorithm

If the fuzzy rules can complete all the possible conditions and fuzzy inputs, one should take the considerations with $n_{r} \times m^{n_{r}}$ rules and conditions, where m is the input numbers and $n_{r}$ represents the number of fuzzy rules. Since these rules involve the states of various conditions of target positions, obstacle positions, AUV state and current disturbance state, it is difficult to enumerate all the appropriate linguistic fuzzy rules beforehand to cover all the possible situations. The following algorithm will adaptively fuse the ACO and PSO algorithms in order to optimize designed fuzzy rules. The following will introduce the steps of the rule generation and optimization method in Figure 4.

Figure 4.

The block diagram of adaptive PSO and ant rule optimization algorithm. PSO: particle swarm optimization.

The first step is to code each rule into a particle or an ant, before the application of ACO and PSO. For the ith input variable and the jth rules solution, the coded result can be obtained as the rules solution individually represented in the first row of Figure 4. For PSO, the rule particles can move and expand quickly in the search space but without global communication. Each rule particle has a position and velocity vector. Each rule particle updates its position and velocity through every learning cycle.

While in continuous ACO, ant rules are randomly distributed among particles. Not only the information can be exchanged among the ant individuals but the ACO Gaussian sampling can push individuals to the end of search from local optimum.

In the beginning of this algorithm, if we set $K_{N}$ as the total rules, N_aN as the solutions of ant rules individuals, one obtains $K_{N} - N_{a N}$ as the particle rules individuals. At the end of this algorithm, $K_{end} = K_{N} - N_{a N}$ represents the increase number of ant rules individuals and N_aN represents the decrease number of particle rules individuals. If $I_{max}$ is set as the maximum iteration number, $I_{N}$ is the iteration number, one can obtain the number of ant individuals from

K_{I c} = int ((K_{end} - N_{a N}) \frac{I_{N}}{I_{max}}) + N_{a N}

In the second step, the particle rules individuals are generated and expanded to the search space; the generated particle individual number is $K_{N} - K_{I c}$ with the iteration as $I_{N} + 1$ . Each particle swarm includes many particles which is used to define and assign all particles initialized positions. For the ith particle, the position and velocity vectors of the ith particle are set as $p_{p}$ and $v_{p}$ , $p_{i}$ is the best solution for a certain particle and $p_{g}$ is the global best solution for the particle individuals. Thus, the ant solutions can guide the particles to better particle solution. The particle velocity can be obtained as

\begin{array}{l} v_{i}^{p} (I_{N} + 1) = ς (v_{i}^{p} (I_{N}) + k_{1}^{p} ψ_{1} \otimes (p_{i} (I_{N}) - p_{p} (I_{N})) \\ + k_{2}^{p} ψ_{2} \otimes (p_{g} (I_{N}) - p_{p} (I_{N})) \end{array}

where ς is the constrict coefficient, $k_{1}^{p}$ and $k_{2}^{p}$ are positive parameters for particle acceleration and $ψ_{i}$ is a random vector with the entries as the random distribution in [0,1]. $ς = 0.7$ , $v_{1}^{p} = v_{2}^{p} = 1$ , with each particle changes according to

p_{p} (I_{N} + 1) = p_{p} (I_{N}) + v_{i}^{p} (I_{N} + 1), i = 1, ..., K_{N} - K_{I c}

where $K_{N} - K_{I c + 1}$ is retained as best performance for the next iteration.

From this approach diagram in Figure 4, the generated fuzzy rules solutions $s_{1}, ..., s_{N}$ are sorted from best to the worst according to their performance. In the graph, every row represents the fuzzy rules solution obtained from particle or ant. The route from node to level can be searched by ant. Each route is correspondent to a classification rule, while each node represents the solution vector. The nodes connected through path segment corresponding to the pheromone level $τ_{i}$ and they are selected with high probabilities

τ_{i} (t + 1) = ρ τ_{i} (t) + Δ τ_{i} (t)

where ρ is a parameter between 0 and 1, and $Δ τ_{i} (t)$ is selected from the quality of solution performance. The stronger the pheromone level $τ_{i}$ , the better the solution is. The order of pheromone levels is sorted from the best to the worst.

The third step is ACO algorithm, and the ant generation process includes three stages:

1. Ant path selection: the new fuzzy rules solution is generated through ant path which is generated through the elite tournament technique. At the $I_{N} + 1$ iteration, the total $K_{I c}$ solutions are generated in the ant path selection process. The elite selection approach selects $L_{I c}$ best solutions for temporary solutions. Among the ant path segments, the node value with highest pheromone level is selected. The value of $L_{I c}$ in the elite selection varies with iteration

L_{I c} = int (K_{I c} \frac{I_{N}}{I_{max}})

The fuzzy rules solutions are generated through elite selection with the increase of global search. In the selection process, the node can be selected from fuzzy rules particle or fuzzy rules ant.

2. In the Gaussian sampling process, a new value near the temporary solution via Gaussian sampling which means the standard deviation is computed as follows

d_{i j} = ζ \sum_{l = 1}^{N} \frac{| {\tilde{s}}_{i j} - s_{l j} |}{N - 1}

where ${\tilde{s}}_{i j}$ is the mean of probability density function and ζ is the constant value. The Gaussian sampling operation process introduces new parameter values to every active rule in the temporary solutions through the mean of probability density function ${\tilde{s}}_{i j}$ . The standard deviation will dynamically change according to iteration number N.

3. In the solution refinement process, through the consideration with the attraction of ants to the optimal individual in the overall population, the solution can be obtained as

{\tilde{s}}_{i} = {\hat{s}}_{i} + ψ_{3} \otimes ({\hat{s}}_{1} - {\hat{s}}_{i})

where $ψ_{3}$ is random vector with uniform distribution random number in the interval [0,1]. The random vector is used for the promotion towards the best solution.

In order to realize rules solution iteration and optimization, reinforcement learning is applied to execute and evaluate the AUV path planning actions. The Q-function can realize mapping from rules-action pairs to predict return.²⁶ The Q-function-based learning will speed up the convergence speed and improve system performance with disturbance. The output of the Q-function can be updated as follows

Q (R (t), a_{k} (t)) = Q (R (t), a_{k} (t)) + α [R (t + 1) + γ Q^{*} (R (t + 1)) - Q (R (t), a_{k} (t))]

where $R (t)$ represent current state of rules, $r (t + 1)$ is the reinforcement reward for the rules, $Q^{*} (R (t + 1))$ is the optimal estimation for the possible actions and γ is discount factor. At each step, the expected Q values are updated through

Q (R + 1)) = \frac{1}{2} \sum_{j = 1}^{n r} (\frac{{\underline{f}}_{j} (R (t))}{\sum_{k = 1}^{n r} {\underline{f}}_{k} (R (t))} + \frac{{\bar{f}}_{j} (R (t))}{\sum_{k = 1}^{n r} {\bar{f}}_{k} (R (t))}) q_{j}

According to equation (22), the Q value can be updated as follows:

Δ Q = r (t + 1) + γ Q^{*} (R (t + 1)) - Q (R (t), a (t))

The best Q values correspond with the quality of the rules actions above. For each rule, there will be a greedy action to achieve maximum Q values. For each rules solution and path planning action, the maximum Q value can be obtained from the estimation as

Q^{*} (R (t + 1)) = \frac{1}{2} \sum_{j = 1}^{n_{R}} (\frac{{\underline{f}}_{i} (R (t))}{\sum_{k = 1}^{n_{R}} {\underline{f}}_{k} (R (t))} + \frac{{\bar{f}}_{i} (R (t))}{\sum_{k = 1}^{n_{R}} {\bar{f}}_{k} (R (t))}) q_{i}^{*}

where $q_{j} (t + 1) = q_{j} (t) + ε Δ q_{j} (t) \begin{matrix} & j = 1, ..., n_{R} \end{matrix}$ and ε is the learning rate.

Δ q_{j} (t) = \frac{1}{2} Δ Q (\frac{{\underline{f}}_{j} (R (t))}{\sum_{k = 1}^{n_{R}} {\underline{f}}_{k} (R (t))} + \frac{{\bar{f}}_{j} (R (t))}{\sum_{k = 1}^{n_{R}} {\bar{f}}_{k} (R (t))})

Experimental results

The following section describes experimental results of two different cases of motion plan with the AUV of Figure 1. In the experiments, the number of fuzzy rules is set to 8. The population of ACO and PSO was set to 60. The total iteration number was set as 150. Thus, the total evaluation trials per run were $60 \times 150 = 9000$ . The real-time motion trajectories results were obtained from control.²⁷

Case 1: Canal following for the wall detection

For the inland canal, the wall following is very important for the wall detection. Although AUV can carry out delicate detection on underwater target, it should cruise close to the wall of complicated contour under disturbances. Therefore, the motion plan should be made according to the detected wall contour and current distance between canal walls. In order to follow the canal wall, distant measurement sonars have been equipped on the AUV. In order to make comparisons, three kinds of motion planning algorithms including the common fuzzy rules-based motion plan of 3.1 and 3.2, PSO optimized fuzzy rules-based motion plan and adaptive ACO-PSO optimized fuzzy rules-based motion plan are used.

In Figure 5(a), the start point is (0, 0) and the end point is (−120, 1000). The expected distance between the AUV and the canal wall is 4 m. The adaptive ACO-PSO fuzzy planner can obtain the most reasonable trajectory for the canal detection. Since all the planned trajectories are obtained according to the AUV real-time state such as headings, speed, obstacle positions and disturbance, the planned trajectories are complicated curves. The motion control of AUV was realized according to the results of the adaptive ACO-PSO fuzzy planner. From Figure 5, the fuzzy planned motion cannot adapt to the environmental disturbance and the dynamic environment, the fixed rules usually lead to late and inflexible responses, thus the more complicated conditions for the disturbance and the turning of the wall the greater will be the errors. The PSO algorithm with fuzzy rules optimization embodies its advantage in the rule changing according to the real state, but the algorithm cannot converge very quickly at some local regions which cause the planning errors great at some time and even joggle at the place of complicated disturbance and turnings. While the adaptive ACO-PSO fuzzy planner is more adaptable for bend or angular intensively changing canals, because the ACO and reinforcement learning can help the algorithms with regional rules generation and quick convergence respectively.

Figure 5.

Comparisons on the motion plan of canal following. (a) Comparisons on the motion plan of wall following of the canal. (b) Heading plan comparisons of motion plan. (c) Distance of AUV from the canal wall. AUV: autonomous underwater vehicle.

Case 2: Moving target following plan

Moving target following is very important but difficult for AUV advanced detection. The AUV should keep correspondent distance with the same speed and heading angle under real-time disturbance. The target following experiments for motion plan have been made in the Penglai offshore of Chinese Shandong province. The oceanic environmental disturbances were obtained from DVL in Figure 6. In the experiments, the planned trajectories from adaptive ACO-PSO fuzzy planner are obtained according to the AUV real-time state such as headings, speed, target speed and positions and disturbance. The motion control of AUV was realized according to the results of the real-time adaptive ACO-PSO fuzzy planning. The ideal distance between the AUV and the mobile target for the target following is 8.5 m.

Figure 6.

Environmental disturbance obtained from DVL. DVL: Doppler Velocity Log.

From Figure 7, the AUV can realize mobile target following through motion plan of ACO-PSO fuzzy planning algorithm. The distance between the AUV and the mobile target is from 7.5 m to 10 m.

Figure 7.

Motion plan of target following. (a) Following trajectories. (b) Heading plan for target following. (c) Speed plan for target following. (d) Distance from the target during following.

Conclusions

The AUV can realize high-accuracy submarine detection and target following through intelligent path planning. This article has proposed a novel adaptive PSO and ACO fusion-based fuzzy rules optimization algorithm to realize intelligent motion planning. Through this algorithm, the preliminary fuzzy rules can be optimized through continuous selection, sampling, refinement and learning. Two typical cases of experiments including canal wall following and mobile target following have been made in order to verify the validity of intelligent path planning method to implement the complicated operation tasks.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by National Natural Science Foundation of China (no. 61633009, 51579053, 5129050) and the China Scholarship Council (201806685042). It is also supported by the Field Fund of the 13th Five-Year Plan for the Equipment Pre-research Fund (no. 61403120301). All these supports are highly appreciated.

ORCID iD

Huang Hai

References

Aghababa

MP.

3D path planning for underwater vehicles using five evolutionary optimization algorithms avoiding static and energetic obstacles. Appl Ocean Res 2012; 38: 48–62.

Cheng

C-T

Fallahi

Leung

. A Genetic algorithm-inspired UUV path planner based on dynamic programming. IEEE Trans Syst Man Cybern Syst Part C 2012; 42(6): 1128–1134.

Lee

T-S

Lee

. A new hybrid terrain coverage method for underwater robotic exploration. J Mar Sci Technol 2014; 19: 75–89.

Zeng

Lian

Sammut

. A survey on path planning for persistent autonomy of autonomous underwater vehicles. Ocean Eng 2015; 110: 303–313.

Liu

Chen

. Energy priority and current model based AUV cruise path planning algorithm. Int J Multimed Ubiquitous Eng 2013; 8(5): 11–18.

Zadeh

Powers

DMW

Sammut

. Toward efficient task assignment and motion planning for large-scale underwater missions. Int J Adv Robot Syst 2016; 13(5): 1–13.

Zeng

Lammas

Sammut

. Shell space decomposition based path planning for AUVs operating in a variable environment. Ocean Eng 2014; 91: 181–195.

Eichhorn

. Optimal routing strategies for autonomous underwater vehicles in time-varying environment. Robot Auton Syst 2015; 67: 33–43.

Petres

Pailhas

Patron

. Path planning for autonomous underwater vehicles. IEEE Trans Robot 2007; 23(2): 331–341.

10.

Zadeh

Yazdani

Sammut

. AUV rendezvous online path planning in a highly cluttered undersea environment using evolutionary algorithms. Robotics (cs.RO) 2018; 70: 929–945.

11.

Zhuang

Sharma

Subudhi

. Efficient collision-free path planning for autonomous underwater vehicles in dynamic environments with a hybrid optimization algorithm. Ocean Eng 2016; 127: 190–199.

12.

Park

J-W

Kwak

H-J

Kang

Y-C

. Advanced fuzzy potential field method for mobile robot obstacle avoidance. Comput Intell Neurosci 2016; 2016: 13.

13.

Sun

Zhu

Jiang

. A novel fuzzy control algorithm for three-dimensional AUV path planning based on sonar model. J Intell Fuzzy Syst 2014; 26: 2913–2926.

14.

Wang

Sun

Yin

. Fuzzy unknown observer-based robust adaptive path following control of underactuated surface vehicles subject to multiple unknowns. Ocean Eng 2019; 176: 57–64.

15.

Wang

Xie

Pan

. Full-state regulation control of asymmetric underactuated surface vehicles. IEEE Trans Ind Electron 2018. DOI: 10.1109/TIE.2018.2890500.

16.

Song

Liu

Bucknall

. A multi-layered fast marching method for unmanned surface vehicle path planning in a time-variant maritime environment. Ocean Eng 2017; 129: 301–317.

17.

Melingui

Merzouki

Mbede

. A novel approach to integrate artificial potential field and fuzzy logic into a common framework for robots autonomous navigation. Proc Inst Mech Eng I J Syst Control Eng 2014; 228(10): 787–801.

18.

Incze

. Optimized deployment of autonomous underwater vehicles for characterization of coastal waters. J Marine Syst 2009; 78: S415–S424.

19.

Wang

S-F

Pan

. Yaw-guided trajectory tracking control of an asymmetric underactuated surface vehicle. In: IEEE Trans Ind Inform 2018. DOI: 10.1109/TII.2018.2877046.

20.

Wang

S-F

Han

. Backpropagating constraints-based trajectory tracking control of a quadrotor with constrained actuator dynamics and complex unknowns. In: IEEE Trans Syst Man Cybern Syst 2018. DOI: 10.1109/TSMC.2018.2834515.

21.

Adouane

. Reactive versus cognitive vehicle navigation based on optimal local and global PELC. Robot Auton Syst 2017; 88: 51–70.

22.

Vongbunyong

Kara

. Basic behaviour control of the vision-based cognitive robotic disassembly automation. Assembly Automation 2013; 33: 38–56.

23.

Fatemi

Haykin

. Cognitive control: theory and application. IEEE Access 2014; 2: 698–710.

24.

Ellefsen

Lepikson

Albiez

. Multiobjective coverage path planning: enabling automated inspection of complex, real-world structures. Appl Soft Comput 2017; 61: 264–282.

25.

Przybylski

Putz

. D* Extra lite: a dynamic A* with search–tree cutting and frontier–gap reparking. Int J Appl Math Comput Sci 2017; 27(2): 273–290.

26.

Seto

. Marine robot autonomy. Berlin: Springer, 2013.

27.

Huang

Guocheng

Hongde

. AUV precise motion control for target following with model uncertainty. Int J Adv Robot Syst 2017; 14(4): 1–11.