A novel approach to integrate potential field and interval type-2 fuzzy learning for the formation control of multiple autonomous underwater vehicles

Abstract

Underwater vehicles coordination and formation have attracted increasingly attentions since their great potential on the real-world applications. However, usually such vehicles are underactuated and with very different environmental difficulties, which are different from those vehicles (robots) on the land. This study proposes a novel approach to integrate potential field and interval type-2 fuzzy learning algorithm for autonomous underwater vehicles formation control based on formation system framework. For the system nonlinearity and complicated environment, support vector machine has been applied to generate optimal rules for the type-2 fuzzy systems. This approach can generate optimal and reasonable formation rules on the face of different situations through classification. Furthermore, reinforcement learning has been combined with fuzzy systems to deal with limited communication state during formation. Therefore, autonomous underwater vehicles can not only execute actions through the evaluation, but also can avoid coupling character between communication state and potential field. Finally, simulations and experiments results have been extensively performed to validate the proposed methods.

Keywords

Autonomous underwater vehicle formation control machine learning potential field fuzzy systems

Introduction

In compare with the robots on the land, underwater vehicles will confront with more complicated and extensive environment. It is more important for the vehicles to autonomously adapt to the environment and cooperate in the operations. The collaboration of multiple comparatively cheap and small autonomous underwater vehicles (AUVs) can accomplish complex task more effectively than an expensive and multi-functional AUV.¹

Recently, researchers have made significant efforts on the multiple AUVs coordinative formation. Different approaches such as generalized coordinate approach,^2,3 the leader–follower approach,^4,5 virtual structure approach,⁶ the potential field approach,⁷ and behavior-based approach. Comparing with robot formation on the land, AUVs submarine formation confronts with acoustic communications with propagating delays and channel noisy. Moreover, for the purpose of distant cruising and operation, AUV is usually underactuated.⁸ It is propelled with one thruster with vertical rudders for the heading and horizontal wings for the diving. In order to realize the formation of multiple AUVs, heterogeneous underwater vehicles are usually employed, particularly with leader–follower formation form, to enhance spatial coverage and improve operation accuracy. The leaders usually equip with higher performance navigation instruments systems and perform the leading role, while the followers obtain position information with comparatively lower precise navigation instruments and relative positioning from the leaders. Strategies of relative positioning showed up in the work of Matsuda et al.⁹ and Allotta et al.¹⁰

Hou and Cheah from Singapore have proposed an adaptive proportional and derivative controller with multi-layered potential field on the basis of formation structure¹¹ and realized 6 degree of freedom AUVs formations in the simulations.¹² Parlangeli and Indiveri¹³ from Italy have established a time varying nonlinear observability models with state augmentation on the basis of global observability model, positioned underactuated AUVs during coordinate motion control, and realized formation cruising. Since those AUVs groups are composed by heterogeneous AUVs with navigation systems of different accuracies, leader–follower approach is often used for the formation control.¹⁴ Researchers from Portugal and Germany have proposed coordinate tracking strategy of heterogeneous AUV on the basis of line of sight and verified the strategy with oceanic experiments.¹⁵ Millán et al.¹⁶ from Spain have established the linearization model for the formation maintenance through the AUVs kinematics and Taylor series expansion and proposed an H2/H∞ controller in order to improve the formation robustness with time delayed communication. Cui et al.¹⁷ from China have constructed a virtual leader AUV in the formation of underactuated AUVs, in order to realize the trajectory convergence. Moreover, a position tracking controller is proposed for the followers to track the virtual leader using Lyapunov and backstepping synthesis.

However, since the formation of multi-AUV in the oceanic environment confronts with limited communication,^9,18 system nonlinearity and complicated unknown environment, high efficiency interaction between AUVs may be difficult for nonlinear formation control algorithm. Therefore, the establishment of formation rules according to current state and environment may improve robustness through fuzzy inference and machine learning.¹⁹ In compare with type-1 fuzzy logic system, type-2 fuzzy logic system can handle rule uncertainties with antecedent or consequent membership functions. Although it is samely characterized with IF-THEN rules, its antecedent or consequent sets are type-2.²⁰ The type-2 fuzzy sets allow researchers to model and minimize the effects of uncertainties of changing environment in rule-based systems. Tsai and Tai²¹ present a distributed consensus formation control law through recurrent interval type-2 fuzzy neural networks. Using online learning and Lyapunov stability theory, intelligent formation has been carried out. Ali et al.²² propose a type-2 fuzzy ontology simulator to calculate and provide accurate information about collision risk and the marine environment during real-time marine operations. Such decision-making system provides high-level autonomy for AUVs formation. On the complicated oceanic environment and formation state, this article will propose a support vector machine (SVM)–based type-2 fuzzy learning approach for the formation control of multiple AUVs.

The rest of this study is organized as follows. In section “Multiple AUVs formation system framework,” a multiple AUVs formation system framework will be established. A SVM-based type-2 fuzzy learning approach for the formation control of multiple AUVs will be proposed in section “Interval type-2 fuzzy learning approach for AUVs formation.” Simulations and experiments will be discussed and analyzed in section “Simulations and experiments.” We will make conclusion in section “Conclusion.”

Multiple AUVs formation system framework

In order to realize cooperative control and interaction, the framework of formation control has been established (see Figure 1) with propulsion subsystem, oceanic survey subsystem, motion control subsystem, navigation subsystem, acoustic communication subsystem, and formation strategy subsystem. Each subsystem is independent with correspondent hardware. Communication between each subsystem is realized through PC104 bus–based TCP/IP protocol.

Figure 1.

Multiple AUVs formation system framework.

Communication between AUVs is realized through AquaComm network acoustic communication module. This multiple channel communication module is developed in the Innovation in Digital Signal Processing and Processing Company of Australia. It can release broad frequency signal through extended frequency spectrum under low energy consumption state.

In order to establish the topology for multiple AUVs coordinate, an $n$ order weighted directed graph $G = (V, ξ, A)$ has been defined, where $V = {s_{1}, s_{2}, . . ., s_{n}}$ is the set of AUV nodes, $ξ \subseteq V \times V$ is the set of edges, and $A = [a_{ij}]$ is a weighted adjacency matrix with nonnegative adjacency elements $a_{ij}$ . Each edge ( $s_{i}$ , $s_{j}$ ) corresponds to an available information channel between the AUV $s_{i}$ and the AUV $s_{j}$ at time t. The node indexes belong to a finite index set $Ω = {1, 2, . . ., n}$ . The neighbors set of node $s_{i}$ is $N_{i} = {s_{j} \in S : (s_{i}, s_{j}) \in ξ}$ . $ξ_{ij} = (s_{i}, s_{j})$ is defined as the edge of $G$ . The $a_{ij}$ corresponds with the fact that edges of the graph are positive, that is, $ξ_{ij} \in ξ \Leftrightarrow a_{ij} > 0$ .

If $(x_{i}, y_{i}, z_{i})$ represents the local coordinate of the ith AUV, $v_{i}$ represents the cruising speed vector of the ith AUV. One obtains

{\begin{matrix} {\overset{\cdot}{x}}_{i} = v_{i}^{H} \cos θ_{i} \\ {\overset{\cdot}{y}}_{i} = v_{i}^{H} \sin θ_{i} \\ {\overset{\cdot}{z}}_{i} = v_{i} \sin ψ_{i} \\ {\overset{\cdot}{θ}}_{i} = ρ_{i} \\ {\overset{\cdot}{ψ}}_{i} = ω_{i} \end{matrix}

(1)

where $ψ_{i}$ denotes the ith AUV pitch angle, $θ_{i}$ denotes the ith AUV heading angle, $p_{i} = [x_{i}, y_{i}, z_{i}, θ_{i}, ψ_{i}, ϕ_{i}]^{T}$ denotes the ith AUV speed and position vectors; thus, ${\overset{\cdot}{p}}_{i} = v_{i}$ , $v_{i} = [{\overset{\cdot}{x}}_{i}, {\overset{\cdot}{y}}_{i}, {\overset{\cdot}{z}}_{i}, {\overset{\cdot}{ϕ}}_{i}, {\overset{\cdot}{ψ}}_{i}, {\overset{\cdot}{θ}}_{i}]^{T}$ . $v_{i}^{H} = v_{i} \cos ψ_{i}$ represents the ith AUV horizontal velocity. Since $v_{i}^{H}$ , $θ_{i}$ , and $ψ_{i}$ correspond with the control input of the thruster, vertical wings, and horizontal rudders, respectively, a cylinderal coordinate $(r_{i}, θ_{i}, z_{i})$ for the underactuated AUV have been established

{\begin{matrix} r_{i} = \sqrt{x_{i}^{2} + y_{i}^{2}} \\ z_{i} = r_{i} \tan ψ_{i} \end{matrix} where {\begin{matrix} x_{i} = r_{i} \cos θ_{i} \\ y_{i} = r_{i} \sin θ_{i} \\ {\overset{\cdot}{r}}_{i} = v_{i}^{H} \end{matrix}

(2)

In order to investigate the dynamic state during multiple AUVs formation, multiple AUVs formation is taken as a constraint multibody system; adjacent AUVs are connected through a virtual spring (see Figure 2). Therefore, the virtual spring between the ith and the jth AUV is set to be at natural length $R_{ij}$ with the stiffness $K_{ij}$ , when these AUVs are cruising at expected positions. When the actual distance $L_{ij}$ between the ith AUV and the jth AUV is greater than expected distance, thus the attracted force $F_{ij}$ is generated

F_{ij} = K_{ij} (L_{ij} - R_{ij})

(3)

Figure 2.

Model of multi-AUV formation.

On the contrary, the rejected force is generated

F_{ij} = K_{ij} (R_{ij} - L_{ij})

(4)

Since the formation of multiple AUVs needs to maintain vertical and horizontal distance at the same time, the natural length of virtual spring $R_{ij}$ is decomposed into ${x_{R}}_{ij}$ , ${y_{R}}_{ij}$ , and ${z_{R}}_{ij}$ at three directions of coordinates. Thus, the attractive and repulsive forces can be obtained as

{\begin{matrix} F_{att} {(x)}_{ij} = K_{xij} ({x_{L}}_{ij} - {x_{R}}_{ij}) \\ F_{att} {(y)}_{ij} = K_{yij} ({y_{L}}_{ij} - {y_{R}}_{ij}) \\ F_{att} {(z)}_{ij} = K_{zij} ({z_{L}}_{ij} - {z_{R}}_{ij}) \end{matrix} and {\begin{matrix} F_{rep} {(x)}_{ij} = K_{xij} ({x_{R}}_{ij} - {x_{L}}_{ij}) \\ F_{rep} {(y)}_{ij} = K_{yij} ({y_{R}}_{ij} - {y_{L}}_{ij}) \\ F_{rep} {(z)}_{ij} = K_{zij} ({z_{R}}_{ij} - {z_{L}}_{ij}) \end{matrix}

The potential energy derived from the attractive and repulsive forces is

{\begin{matrix} P_{att} {(x)}_{ij} = \frac{K_{xij}}{2} {({x_{L}}_{ij} - {x_{R}}_{ij})}^{2} \\ P_{att} {(y)}_{ij} = \frac{K_{yij}}{2} {({y_{L}}_{ij} - {y_{R}}_{ij})}^{2} \\ P_{att} {(z)}_{ij} = \frac{K_{zij}}{2} {({z_{L}}_{ij} - {z_{R}}_{ij})}^{2} \end{matrix} and {\begin{matrix} P_{rep} {(x)}_{ij} = \frac{K_{xij}}{2} {({x_{R}}_{ij} - {x_{L}}_{ij})}^{2} \\ P_{rep} {(y)}_{ij} = \frac{K_{yij}}{2} {({r_{y}}_{ij} - {r_{y}}_{ij})}^{2} \\ P_{rep} {(z)}_{ij} = \frac{K_{zij}}{2} {({z_{R}}_{ij} - {z_{L}}_{ij})}^{2} \end{matrix}

where ${x_{L}}_{ij}$ , ${y_{L}}_{ij}$ , and ${z_{L}}_{ij}$ are the horizontal and vertical distance between the ith and the jth AUV

{\begin{matrix} {z_{L}}_{ij} = | z_{i} - z_{j} | \\ {r_{L}}_{ij} = \sqrt{{(x_{i} - x_{j})}^{2} + (y_{i} - y_{j}})^{2} \end{matrix}

Since multiple AUVs formation is facing with complicated environmental disturbance and limited communication, multiple layered stiffness virtual spring has been established. In Figure 3, $r_{1 i}$ is the innermost layer and $r_{Ri}$ is the outermost layer. For the ith AUV, attractive forces will be derived in the layer of $r_{Ri}$ through virtual spring, while repulsive forces will be derived out of the layer of $r_{Ri}$ through virtual spring.

Figure 3.

Multi-layer region of the ith AUV.

The attractive or repulsive forces exert on the jth AUV should be the sum of attractive or repulsive forces derived from virtual spring from the Rth layer to its located layer. Moreover, when the jth AUV is located at the innermost layer, the repulsive energy exerted from the ith AUV to the jth AUV will be the sum of virtual spring repulsive energies from the Rth layer to the first layer. Considering the AUV is underactuated, its vertical and horizontal movements are realized through thruster with wings or rudders, potential functions are order reduced using log function so as to improve the convergence of the controller (see Figure 6 in the simulation)

{\begin{matrix} P_{rep} {(x)}_{ij} = \frac{K_{xij}}{2} In [{(x_{1 i} - x_{Lij})}^{2}] + \frac{K_{xij}}{2} In [{(x_{2 i} - x_{Lij})}^{2}] \\ + \dots \frac{K_{xij}}{2} In [{(x_{Ri} - x_{Lij})}^{2}] \\ P_{rep} {(y)}_{ij} = \frac{K_{yij}}{2} In [{(y_{1 i} - y_{Lij})}^{2}] + \frac{K_{yij}}{2} In [{(y_{2 i} - y_{Lij})}^{2}] \\ + \dots \frac{K_{yij}}{2} In [{(y_{Ri} - y_{Lij})}^{2}] \\ P_{rep} {(z)}_{ij} = \frac{K_{zij}}{2} In [{(z_{1 i} - z_{Lij})}^{2}] + \frac{K_{zij}}{2} In [{(z_{2 i} - z_{Lij})}^{2}] \\ + \dots \frac{K_{zij}}{2} In [{(z_{Ri} - z_{Lij})}^{2}] \end{matrix}

(5)

Thus, a multi-layered region for multiple AUV formation can be defined in Figure 4, in order to improve the formation stability. $p_{c}^{d}$ represents geometric center of the formation shape. The attractive or repulsive forces range of the ith AUV $p_{i}$ is set according to the formation requirement.

Figure 4.

The multi-layer region for AUV formation.

Interval type-2 fuzzy learning approach for AUVs formation

As a type of effective and reliable machine learning techniques, SVM has been widely used in classification problems. It constructs a hyper plane to separate two classes of data so that the margin is largest.²³ In compare with neural fuzzy systems, the basic SVM requires less prior knowledge of the problem, but can solve the learning problem with smaller number of samples. Instead of minimizing an objective function on the basis of training, SVM attempts to minimize one bound on the generalization error.²⁴ Since the problem of multiple AUVs formation confronts with tracking and formation maintain, collision avoidance, oceanic current and uncertain disturbance, and so on, the enumeration of fuzzy linguistic rules cannot cover all the situations. Although SVM is computationally expensive, it can generate optimal and reasonable rules on the current situations through classification. Therefore, the interval type-2 fuzzy learning approach has integrated SVM to generate optimal rules for the complicated formation conditions. Furthermore, in the combination with reinforcement learning, limited communication state will be dealt with for the fuzzy systems during multiple AUVs formation. Its structure is illustrated in Figure 5.

Figure 5.

Structure of interval type-2 fuzzy learning approach.

SVM rules generation algorithm

For the formation state between the ith and the jth AUV, the rules are set in the following: IF ( $P (x)_{ij}$ , $P (y)_{ij}$ , $P (z)_{ij}$ ) is ( $\tilde{D} (x)_{ij}^{I}$ , $\tilde{D} (y)_{ij}^{I}$ , $\tilde{D} (z)_{ij}^{I}$ ), AND the expected cruising trajectory is ( $x_{di}$ , $y_{di}$ , $z_{di}$ ), AND oceanic current state is $v_{c}$ THEN The output includes expected state of the ith AUV ( $v_{i}^{H}$ , $θ_{i}$ , $ψ_{i}$ ).

Apparently, the states include permutation and combination of various conditions of potential field, cruising tracking trajectory state, uncertain oceanic current, and so on. It is difficult to enumerate the linguistic rules and input output data pairs to cover all the fuzzy situations. This article will apply the SVM as a tool in the fuzzy rules determination. For the SVM rule generation approach, the determine function is defined in the following

f (X^{I}) = w^{T} ϕ (X^{I}) + b

(6)

where $X^{I}$ is the set of input signals $X^{I} = (x_{1}^{I}, x_{2}^{I}, . . ., x_{ni}^{I})$ . $ϕ (X^{I})$ is a nonlinear function which maps the input vector $X^{I}$ into higher dimension feature space, $w$ is ni dimensional weights vector, and $b$ is the scalar.

The obtaining of a hyper plane separating two classes will lead to the following optimal problem²⁴

{\begin{matrix} min \frac{1}{2} w^{T} w \\ \begin{matrix} subject & to y_{l}^{I} (w^{T} ϕ (X^{I}) + b_{l}) \geq 1 \begin{matrix} \forall l \end{matrix} \end{matrix} \end{matrix}

(7)

Thus, optimal hyper plane will be formulated from the following optimization problem

{\begin{matrix} min (\frac{1}{2} w^{T} w + C \sum_{i = 1}^{nr} ξ_{i}) \\ \begin{matrix} subject & to y_{i}^{I} (w^{T} ϕ (X^{I}) + b_{i}) \geq 1 - ξ_{i}, \begin{matrix} ξ_{i} \geq 0, i = 1, 2, \dots, nr \end{matrix} \end{matrix} \end{matrix}

(8)

where C > 0 is the regularization parameter which control the trade-off between margin and error for the classification. The primal problem of equation (8) can be solved by the following Lagrangian function

\begin{array}{l} L = \frac{1}{2} w^{T} w + C \sum_{i = 1}^{n r} ξ_{i} - \sum_{i = 1}^{n r} α_{i} ξ_{i} \\ - \sum_{i = 1}^{n r} β_{i} [y_{i}^{I} (w^{T} ϕ (X^{I}) + b_{i}) + ξ_{i} - 1] \end{array}

(9)

where $α_{i}$ and $β_{i}$ ( $0 \leq β_{i}, α_{i} \leq C$ ) are Lagrange multipliers. Therefore, the following dual quadratic problem can be obtained from equations (8) and (9)

{\begin{matrix} max [\sum_{j = 1}^{ni} \sum_{i = 1}^{ni} [β_{i} - \frac{1}{2} β_{i} β_{j} y_{i}^{I} y_{j}^{I} K (x_{i}^{I}, x_{j}^{I})]] \\ \begin{matrix} subject & to \sum_{i = 1}^{ni} y_{i}^{I} β_{i} = 0, \begin{matrix} 0 \leq β_{i} \leq C, i = 1, 2, \dots, ni \end{matrix} \end{matrix} \end{matrix}

(10)

where $K (x_{i}^{I}, x_{j}^{I})$ is a kernel function which is defined as

K (x_{i}^{I}, x_{j}^{I}) = ϕ (x_{i}^{I}) \cdot ϕ (x_{j}^{I}) = \exp (- γ {‖ x_{i}^{r} - x_{j}^{r} ‖}^{2})

where $γ$ is the scaling factor. Therefore, the decision function is obtained as

f (X^{I}) = sgn [\sum_{i = 1}^{ni} β_{i} y_{i}^{I} K (x_{i}^{I}, X^{I}) + b]

(11)

Interval type-2 fuzzy learning approach

The system contains two components: the type-reducer, which maps a type-2 fuzzy set into a type-1 fuzzy set and a normal defuzzifier, which transforms a fuzzy output into a crisp output. A type-2 fuzzy set $\tilde{D}$ is defined as

\tilde{D} = {((x^{I}, u), μ_{\tilde{D}} (x^{I}, u)) | \forall x^{I} \in X^{I}, \forall u \in J_{x} \subseteq [0, 1]}

(12)

where $μ_{\tilde{D}} (x^{I}, u) \in [0, 1]$ , $x^{I}$ represents the input element of the type-2 fuzzy logic system, the set $X$ includes current potential field, communication state, expected cruising trajectory, and oceanic current, and $J_{x}$ represents the main membership of $x^{I}$ in the $\tilde{D}$

\tilde{D} = \int_{x^{I} \in X} \int_{u \in J_{x}} μ_{\tilde{D}} (x^{I}, u) / μ_{\tilde{D}} (x^{I}, u) (x^{I}, u) (x^{I}, u), \begin{matrix} J_{x} \subseteq [0, 1] \end{matrix}

(13)

where ∫∫ indicates union over all admissible $x^{I}$ and u. If we define the lth input element as $x_{l}^{I}$ and the fuzzy set as ${\tilde{D}}_{l}^{I}$ , $l = 1, 2, . . ., ni$ is the input signal number. A Gaussian primary membership function with uncertain mean and fixed standard deviation $σ_{l}^{k}$ is expressed as

μ_{{\tilde{D}}_{kl}} = \exp [- \frac{1}{2} {(\frac{x_{kl}^{I} - m_{kl}}{σ_{kl}})}^{2}]

(14)

where $m_{kl} \in [m_{kl 1}, m_{kl 2}]$ . The membership degree $μ_{{\tilde{D}}_{kl}}$ is represented as a bounded interval set of $μ_{{\tilde{D}}_{kl}} \in [{\underline{μ}}_{{\tilde{D}}_{kl}}, {\bar{μ}}_{{\tilde{D}}_{kl}}]$ , where ${\underline{μ}}_{{\tilde{D}}_{kl}}$ and ${\bar{μ}}_{{\tilde{D}}_{kl}^{I}}$ represent the lower and upper bound, respectively, k = 1, 2,…, nr is the number of the rules

\begin{array}{l} {\bar{μ}}_{{\tilde{D}}_{k l}} = {\begin{matrix} μ_{{\tilde{D}}_{k l}} (m_{k l 1}, σ_{k l}, x_{k l}), & x_{k l} < m_{k l 1} \\ 1, & m_{k l 1} \leq x_{k l} \leq m_{k l 2} \\ μ_{{\tilde{D}}_{k l}} (m_{k l 2}, σ_{k l}, x_{k l}), & x_{k l} > m_{k l 2} \end{matrix} and \\ {\underline{μ}}_{{\tilde{D}}_{k l}} = {\begin{matrix} μ_{{\tilde{D}}_{k l}} (m_{k l 2}, σ_{k l}, x_{k l}) & x_{k l}^{I} \leq \frac{m_{k l 1} + m_{k l 2}}{2} \\ μ_{{\tilde{D}}_{k l}} (m_{k l 1}, σ_{k l}, x_{k l}) & x_{k l}^{I} > \frac{m_{k l 1} + m_{k l 2}}{2} \end{matrix} \end{array}

(15)

Using an algebraic product operation, the fuzzy operation is implemented. The result is an interval type-1 fuzzy set. For the ith rule, the firing strength is computed as

F_{k} = [{Π_{l = 1}^{ni}}_{{\tilde{D}}_{kl}}, Π_{l = 1}^{ni} {\bar{μ}}_{{\tilde{D}}_{kl}}] = [{\underline{f}}_{k}, {\bar{f}}_{k}]

(16)

where ${\underline{f}}_{k}$ is the lower firing strength and ${\bar{f}}_{k}$ is the upper firing strength. If we set $v_{L}$ as the left most point and $v_{R}$ as the right most point of the interval type-1 fuzzy set, they can be expressed as

\begin{array}{l} v_{L} = \frac{\sum_{k = 1}^{L c} {\bar{f}}_{k} a_{k} + \sum_{k = L c + 1}^{n r} {\underline{f}}_{k} a_{k}}{\sum_{k = 1}^{L c} {\bar{f}}_{k} + \sum_{k = L c + 1}^{n r} {\underline{f}}_{k}}, \\ v_{R} = \frac{\sum_{k = 1}^{R c} {\underline{f}}_{k} a_{k} + \sum_{k = R c + 1}^{n r} {\bar{f}}_{k} a_{k}}{\sum_{k = 1}^{R c} {\underline{f}}_{k} + \sum_{k = R c + 1}^{n r} {\bar{f}}_{k}} \end{array}

(17)

where Lc and Rc represent the left and right crossover points, respectively, and nr is the number of the rules. On the basis of the center sets type reduction, the defuzzified output is the average of $v_{L}$ and $v_{R}$

v_{o} = \frac{(v_{R} + v_{L})}{2}

(18)

Since reinforcement learning is a suitable algorithm for solving optimization problems through online interactive learning rather than depending on existence database,²⁵ it will help AUVs issue and improve formation control commands with uncertain conditions.²⁶ For the multiple AUVs formation, uncertain acoustic communication conditions may affect great on the understanding of the adjacent AUV position. The interval type-2 fuzzy learning approach will apply the Q-learning algorithm to optimize the $v_{o}$ according to current acoustic communication state $s (t)$ through maximized reward function. Thus, the ith AUV can not only execute an action $a_{k}$ through the evaluation, but also can avoid coupling character between communication state and potential field. According to Q-learning algorithm, the action $a_{k}$ will be updated as follows¹⁹

\begin{matrix} Q (s (t), a_{k} (t)) = Q (s (t), a_{k} (t)) + α [r (t + 1) \\ + γ Q^{*} (s (t + 1)) - Q (s (t), a_{k} (t)) \end{matrix}]

(19)

where $r (t + 1)$ is the reinforcement reward and $Q^{*} (s (t + 1))$ is the optimal estimation in the set of possible actions. The expected Q value for the action output is

Q (s + 1)) = \frac{1}{2} \sum_{i = 1}^{nr} (\frac{{\underline{f}}_{i} (s (t))}{\sum_{k = 1}^{nr} {\underline{f}}_{k} (s (t))} + \frac{{\bar{f}}_{i} (s (t))}{\sum_{k = 1}^{nr} {\bar{f}}_{k} (s (t))}) q_{i}

(20)

According to equation (19), the Q value will be updated as

Δ Q = r (t + 1) + γ Q^{*} (s (t + 1)) - Q (s (t), a (t))

(21)

For each rule and action, the maximum Q value can be estimated as

Q^{*} (s (t + 1)) = \frac{1}{2} \sum_{i = 1}^{nr} (\frac{{\underline{f}}_{i} (s (t))}{\sum_{k = 1}^{nr} {\underline{f}}_{k} (s (t))} + \frac{{\bar{f}}_{i} (s (t))}{\sum_{k = 1}^{nr} {\bar{f}}_{k} (s (t))}) q_{i}^{*}

(22)

where $q_{i} (t + 1) = q_{i} (t) + ε Δ q_{i} (t) \begin{matrix} i = 1, . . ., nr \end{matrix}$ , and $ε$ is the learning rate

Δ q_{i} (t) = \frac{1}{2} Δ Q (\frac{{\underline{f}}_{i} (s (t))}{\sum_{k = 1}^{nr} {\underline{f}}_{k} (s (t))} + \frac{{\bar{f}}_{i} (s (t))}{\sum_{k = 1}^{nr} {\bar{f}}_{k} (s (t))})

(23)

Simulations and experiments

Experimental setup

In order to validate the proposed interval type-2 fuzzy learning approach, simulation platform with three heterogeneous underactuated AUVs has been established. These AUVs are connected in series with correlated parameters presented in Table 1. These parameters are corresponding to three real AUVs available of our institutions. Simulation platform includes three different AUV control platforms and maneuverability simulation platforms.

Table 1.

Size and maneuverability parameters for the three AUVs.

AUV	Length (m)	Maximal rotary body diameter (m)	Weight in the air (kg)	Maximal truster of main thruster (kg)
Leader (AUV1)	5.6	0.8	1300	180
Follower 1 (AUV2)	2.3	0.32	160	26
Follower 2 (AUV3)	1.2	0.3	80	20

AUV: autonomous underwater vehicle.

In the AUV control platforms, S surface control scheme²⁷ has been applied for the AUV control, and the control commands of propeller, heading rudder, and horizontal wing are brought forth through thrust allocation in order to realize AUV cruising velocity, heading control, and diving control.

The maneuverability simulation platform has been established on the basis of dynamic equations with hydrodynamic parameters of correspondent AUV. In the maneuverability simulation platform, the inputs are control voltage of propeller, angle control commands of heading rudder and horizontal wing, control voltage is converted into propeller thrust through thrust–volt curves from thruster experiments, the outputs are the AUV real-time positions deduced from hydrodynamic equations. Formation control and planning sections are set up on the basis of the leader AUV.

Figure 6 illustrates the formation simulation trajectory results from the three AUVs of Table 1. Three AUVs are set out at initial positions (0,0,0), (40,40,0), and (60,60,0), respectively. Then make a 6 m dive and 90° heading control, afterward move along the north direction. Since different AUVs are different in size, gyration radius, and maneuverability, the formation process involves the coupling effects of diving and heading control. By taking underactuation into consideration, potential functions are order reduced using log function so as to improve the convergence of the controller. Therefore, the AUVs formation is stabilized after fluctuation and adjustment.

Figure 6.

Potential field comparisons between second order and “ln.”

In the formation simulations of Figure 7, comparisons have been made between the adaptive potential formation scheme of Huang et al.⁷ and formation approach of this article. The disturbance is set with current speed v_cur = 0.2 m/s, west. In Figure 7(a) and (b), the three AUVs plans to follow folding line and round curve with line shape, for example, the followers are planned to maintain the same distance one after another. The membership function of type-2 fuzzy logic system includes the potential field, expected cruising trajectory, oceanic current state as inputs (see Figure 8). From comparisons, for the underactuated AUVs, great change of potential energy plus current field disturbance may cause formation shape fluctuation, and even failure, since limited controlled freedom is available. However, this article has proposed an interval type-2 fuzzy learning approach integrated with potential field. This approach automatically generates fuzzy rules according to current conditions, reduces the magnitude of potential energy through SVM rule generation algorithm. As a result, it not only maintains the formation shape more stable, but also improves the controller robust against current disturbance.

Figure 7.

Comparison between adaptive potential field (APF) and type-2 fuzzy learning approach integrated with potential field (T2FLP): (a) formation along folding line and (b) formation along round curve.

Figure 8.

Membership functions.

Figure 9 illustrates the simulation of interval type-2 fuzzy learning approach with communication state evaluation. In the simulations, three AUVs are making formation on the basis of topology, the communication state such as packages losses, time delay, and throughput is prescribed in Figure 9(a)–(c). Before reinforcement learning, the formation of multiple AUVs cannot be realized due to communication deficiency. After reinforcement learning in Figure 9(d), the formation approach has been optimized through the communication evaluation, taking actions and rewards; thus, the formation of multiple AUVs has been greatly improved through reinforcement learning.

Figure 9.

Simulation results from communication state evaluation: (a) package delay, (b) package loss, (c) throughput, and (d) formation trajectory.

Results from offshore experiments of AUVs formation on the water surface are illustrated in Figure 10. The AUVs formation traces were given pre-programmed missions. In the first scenario, AUVs were planned 90° yaw trace to test formation performance of heterogeneous vehicles. In the second scenario, AUVs were planned linear traces. In the third scenario, AUVs were planned with polygon path to validate the performance of regional search. From the experiments, vehicles of different sizes, gyration radius, and maneuverability can realize formation cruising under the proposed approach with.

Figure 10.

Surface experiment on strait and folding trajectory.

Conclusion

In the consideration with the AUVs operation and formation framework, this study has developed a novel approach to integrate potential field and interval type-2 fuzzy learning for the formation control of multiple AUVs based on formation system framework. Contributions and novelties of this article are twofold:

For the system nonlinear character and complicated formation state, SVM has been integrated into the type-2 fuzzy learning systems to generate optimal rules. In the simulations, this approach automatically generates fuzzy rules, reduces the magnitude of potential energy, and as a result, not only maintains the formation shape more stable, but also improves the controller robust against current disturbance.

Reinforcement learning has been combined with type-2 fuzzy systems to deal with acoustic communication conditions during the formation process. Therefore, AUVs can not only execute actions through the evaluation, but also can avoid coupling character between communication state and potential field.

Simulations and comparisons for the cruising in disturbances have been made and illustrated the effectiveness of the proposed formation strategies. Offshore experiments with heterogeneous AUVs surface formation testify its performance of regional coverage.

Footnotes

Handling Editor: Tao Feng

Author’s note

Authors Qirong Tang and Hai Huang contributed equally to this work.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project is supported by National Science Foundation of China (Nos 61633009, 51579053, 5129050), Major National Science and Technology Project (2015ZX01041101), the Promotion Funds for the National Significant Requirements of Central Universities (HEUCFP201603) and also funded by the Key Basic Research Project of “Shanghai Science and Technology Innovation Plan” (No. 15JC1403300). All these supports are highly appreciated.

References

Park

. Adaptive formation control of underactuated autonomous underwater vehicles. Ocean Eng 2015; 96: 1–7.

Woolsey

Techy

. Cross-track control of a slender, underactuated AUV using potential shaping. Ocean Eng 2009; 36: 82–91.

Xue

. Adaptive coordinated tracking control of multiple autonomous underwater vehicles. Ocean Eng 2014; 91: 84–90.

Ismail

Dunnigan

. A region boundary-based geometric formation control scheme for multiple autonomous underwater vehicles. In: Proceedings of international conference on electrical, control and computer engineering, Pahang, Malaysia, 21–22 June 2011, pp.491–496. New York: IEEE.

Consolini

Morbidi

Prattichizzo

et al . Leader–follower formation control of nonholonomic mobile robots with input constraints. Automatica 2008; 44: 1343-1349.

Kyrkjebø

Pettersen

. A virtual vehicle approach to output synchronization control. In: Proceedings of the 45th IEEE conference on decision and control, San Diego, CA, 13–15 December 2006, pp.6016–6021. New York: IEEE.

Huang

Liao

Shen

et al . Adaptive AUV formation strategy under acoustic communication conditions. In: Proceedings of IEEE OCEANS, Taipei, Taiwan, 7–10 April 2014, pp.216–221. New York: IEEE.

Liang

Kang

. UUV formation system modeling and simulation research based on multi-agent interaction chain. Int J Model Simul Sci Comput 2015; 6: 1550019.

Matsuda

Maki

Sakamaki

et al . State estimation and compression method for the navigation of multiple autonomous underwater vehicles with limited communication traffic. IEEE J Oceanic Eng 2015; 40: 337–351.

10.

Allotta

Caiti

Costanzi

et al . Cooperative navigation of AUVs via acoustic communication networking: field experience with the typhoon vehicles. Auton Robot 2016; 40: 1229–1244.

11.

Cheaha

Houa

Slotine

JJE

. Region-based shape control for a swarm of robots. Automatica 2009; 45: 2406–2411.

12.

Hou

Cheah

. Can a simple control scheme work for a formation control of multiple autonomous underwater vehicles. IEEE T Contr Syst T 2011; 19: 1090–1101.

13.

Parlangeli

Indiveri

. Single range observability for cooperative underactuated underwater vehicles. IFAC Proc Vol 2015; 40: 129–141.

14.

Chen

Sun

Yang

et al . Leader-follower formation control of multiple non-holonomic mobile robots incorporating a receding-horizon scheme. Int J Robot Res 2010; 29: 727–747.

15.

Glotzbach

Schneider

Otto

. Cooperative line of sight target tracking for heterogeneous unmanned marine vehicle teams: from theory to practice. Robot Auton Syst 2015; 67: 53–60.

16.

Millán

Orihuela

Jurado

et al . Formation control of autonomous underwater vehicles subject to communication delays. IEEE T Contr Syst T 2014; 22: 770–777.

17.

Cui

How

BVE

et al . Leader-follower formation control of underactuated autonomous underwater vehicles. Ocean Eng 2010; 36: 1491–1502.

18.

Zhang

et al . Formation control of impulsive networked autonomous underwater vehicles under fixed and switching topologies. Neurocomputing 2015; 147: 291–298.

19.

Seto

. Marine robot autonomy. New York: Springer-Verlag, 2013.

20.

Galluzzo

Cosenza

. Control of a non-isothermal continuous stirred tank reactor by a feedback–feedforward structure using type-2 fuzzy logic controllers. Inform Sciences 2011; 181: 3535–3550.

21.

Tsai

Tai

. Distributed sliding-mode formation control using recurrent interval type 2 fuzzy neural networks for uncertain multi-ballbots. In: IEEE international conference on fuzzy systems, Vancouver, BC, Canada, 24–29 July 2016.New York: IEEE.

22.

Ali

Kim

. Type-2 fuzzy ontology-based semantic knowledge for collision avoidance of autonomous underwater vehicles. Inform Sciences 2015; 295: 441–464.

23.

Chen

Wang

. Support vector learning for fuzzy rule-based classification systems. IEEE T Fuzzy Syst 2003; 11: 716–728.

24.

Cheng

Juang

. An incremental support vector machine-trained TS-type fuzzy system for online classification problems. Fuzzy Set Syst 2011; 163: 24–44.

25.

Walls

Eustice

. An origin state method for communication constrained cooperative localization with robustness to packet loss. Int J Robot Res 2014; 33: 1191–1208.

26.

Xie

Chen

et al . Solution to reinforcement learning problems with artificial potential field. J Cent South Univ 2008; 15: 552–557.

27.

Zhang

Wan

et al . Optimization of S-surface controller for autonomous underwater vehicle with immune-genetic algorithm. J Harbin Inst Technol 2008; 15: 404–410.