Sage Journals: Discover world-class research

Abstract

This work investigates trajectory-tacking control problem for underactuated autonomous underwater vehicles (AUV) with unknown dynamics. Due to the unknown dynamics, an action-critic networks based adaptive dynamic programming (ADP) scheme combined with backstepping approach is designed, which can achieve high-level system stability and tracking control accuracy. Firstly, the backstepping approach is introduced into the kinematic model of underactuated AUV and produces a virtual velocity control which is taken as the desired velocity input of the dynamic model of underactuated AUV. Secondly, the error tracking system is constructed according to the dynamic model of underactuated AUV. Thirdly, the critic neural network and the action neural network are employed to transform the trajectory-tracking control problem into optimal control problem based on policy iteration algorithm. At last simulation results are given to verify the effectiveness of the proposed control scheme.

Keywords

Adaptive dynamic programming (ADP)backstepping approach tracking control autonomous underwater vehicle (AUV)

1 Introduction

The motion control of autonomous underwater vehicle(AUV) is still a hot research topic due to its wide applications [1]. Tracking control is one fundamental functionality of motion control for AUV [2, 3]. Many control schemes are developed to solve the trajectory-tracking problem for fully-actuated AUV, such as sliding mode control method [4, 5], adaptive control method [6], observer based adaptive dynamic programming (ADP) [7], backstepping method [8] and dynamic surface control method[9].

However, Some AUV is underactuated and the number of independent control variables of underactuated AUV is less than the number of degrees of freedom (DOF) of underactuated AUV. The underactuated AUV is a complex system [10]. The unknown dynamics could reduce the tracking accuracy and system stability of underactuated AUV, which adds more difficulties in the process of trajectory-tracking control. These difficulties serve as the motivation of this work.

Traditionally, the tracking control problems of underactuated AUV have been solved through a variety of control technologies. The robust control method [11] and backstepping method [12, 13] can deal with high-order nonlinear systems. However, the backstepping method depends on the accurate model of AUV. And the robust control method can cause the chattering phenomenon. The adaptive neural control method does not depend on the accurate model of AUV. The neural network is introduced to solve the trajectory-tracking problem for AUV due to its strong nonlinear fitting ability, strong robustness and strong self-learning ability. The adaptive neural network tracking controller [14] constructed with the radial basic function neural network is developed for underwater vehicle with unknown nonlinear function. The deterministic policy gradient [15] using multi Pseudo Q-learning method is proposed to reduce the overestimation of action-value function for underactuated AUV with unknown dynamics and constrained inputs. The neural networks are employed to approximate the uncertain underactuated vessel dynamics and external disturbances [16]. The radial basis function neural network is employed to deal with the uncertain nonlinear dynamics for the underactuated vessel [17]. The neural network is designed to solve the cooperative path following problem for a fleet of underactuated AUV with uncertain nonlinear dynamics [18]. The neural network is employed to estimate the unknown nonlinear dynamics of AUV [19]. Like the adaptive neural control method, fuzzy control method does not know the accurate model of AUV. However, the the fuzzy control method strongly depend on the prior knowledge. A robust adaptive control scheme based on fully-tuned fuzzy neural network is proposed to solve the trajectory and attitude problem for unmanned underwater vehicle with thruster dynamics and unknown disturbances [20]. Sliding mode method can overcome the influence of unknown disturbance of AUV. However, it will happen chatter in practical applications. Although a number of traditional control schemes have given solutions to the tracking control problems for AUV with different situations, it is necessary to develop more powerful control schemes to meet higher requirements for control performance of AUV. Some researchers have began to focus on two-player zero-sum game, which has received great development in recent years.

However, the mentioned researches usually employed state observer to estimate system state due to the unknown disturbances or uncertainties, which may accumulate the error. Therefore, it is necessary to seek one optimal control scheme for underactuated AUV, which can minimize the performance index function, and also can reduce the effect of unknown dynamics. Unfortunately, there is a few researches that focus on coping with adaptive dynamic programming (ADP) based trajectory-tracking control for underactuated AUV with unknown dynamics. As a powerful optimization tool, ADP scheme has received more attention [21]. ADP scheme has been introduced to compensate for unknown dynamics, such as control input nonlinearities and model uncertainties for nonlinear systems [22, 23]. Policy iteration [24 –27] and value iteration [28, 29] are two primary iteration ADP algorithms. The main contributions of this paper can be stated as below:

The virtual velocity control input is designed using the backstepping method, which is taken as the reference velocity input of the dynamic model of underactuated AUV.

The unknown dynamics [21, 30] is considered in this work. Then, the action-critic networks based ADP is employed to transform trajectory-tracking control problem into optimal control problem.

The online policy iteration algorithm and weight update laws are designed. In order to verify the effectiveness of the method proposed in this paper, the simulation results are given with the compared method proposed in [31].

This work is organized as follows. In Section 2, The kinematic model and the dynamic model of underactuated AUV based on the reference model of AUV in [32] are given. In Section 3, ADP optimal controller is designed via backstepping method. In Section 4, two simulation examples are provided. Section 5 gives the conclusions.

2 Problem formulation and mathematical model of AUV

The mathematica model of underactuated AUV is shown in Fig. 1. Two coordinate systems are used that one is the universal frame {O_e - X_eY_eZ_e} and the other is the body-fixed frame {O_b - X_bY_bZ_b}.

Fig. 1

Underactuated AUV model system.

Table 1

Notation of motion for underactuated AUV

DOF(degree of freedom)	description	Velocity	position and attitude
1	surge	u	x
2	sway	v	y
3	heave	w	z
4	roll	p	φ
5	pitch	q	θ
6	yaw	r	ψ

2.1 Kinematic model of AUV

The kinematic model is given as follows: $\dot{η} = J (η) ξ$ (1) where J (η) is the coordinate transformation matrix.

2.2 Dynamic model of AUV

The dynamic model of underactuated AUV is established as follows: $M \dot{ξ} + C (ξ) ξ + D (ξ) ξ + g (η) + τ_{ɛ} = τ$ (2) where M, C (ξ), D (ξ), τ_ɛ, τ and g (η) are the inertia matrix, the Coriolis and centripetal matrix, the hydrodynamic damping matrix, the force vector introduced by the unknown dynamics, the thrust force vector and the gravity and buoyancy forces vector respectively.

The dynamic model (2) can be rewritten as follows: $\dot{ξ} = M^{- 1} (- C (ξ) ξ - D (ξ) ξ - g (η) - τ_{ɛ} + τ) .$ (3)

2.3 Problem formulation

The position error is defined as follows: $e_{η} = η - η_{d} .$ (4)

The time derivative of (4) is given as follows: ${\dot{e}}_{η} = \dot{η} - {\dot{η}}_{d} = J (η) ξ - J (η_{d}) ξ_{d} .$ (5)

The velocity error is defined as follows: $e_{ξ} = ξ - ξ_{vr}$ (6) where ξ_vr is the virtual velocity control input based on the backstepping approach.

The time derivative of (6) is given as follows: ${\dot{e}}_{ξ} = \dot{ξ} - {\dot{ξ}}_{vr} .$ (7)

3 ADP based optimal control design via backstepping method

The block diagram of the proposed control scheme is show in Fig. 2.

Fig. 2

Underactuated AUV controller block diagram.

3.1 Kinematic control via backstepping method

Considering the kinematic model of underactuated AUV, we make e_η = 0 as t→ ∞. According to (4), (5) and (6), we can get the virtual velocity control input ξ_vr, which is given as follows: $ξ_{vr} = J^{- 1} (e_{η} + η_{d}) (J (η_{d}) ξ_{d} - k_{1} e_{η})$ (8) where k₁ > 0 and J (e_η + η_d) is invertible.

The time derivative of (8) is given as follows: $\begin{matrix} {\dot{ξ}}_{vr} & = J^{- 1} (e_{η} + η_{d}) (\dot{J} (η_{d}) ξ_{d} + J (η_{d}) {\dot{ξ}}_{d} \\ - \dot{J} (e_{η} + η_{d}) ξ_{vr} - k_{1} {\dot{e}}_{η}) \\ = J^{- 1} (e_{η} + η_{d}) (\dot{J} (η_{d}) ξ_{d} + J (η_{d}) {\dot{ξ}}_{d} \\ - \dot{J} (e_{η} + η_{d}) J^{- 1} (e_{η} + η_{d}) \\ (J (η_{d}) ξ_{d} - k_{1} e_{η}) - k_{1} {\dot{e}}_{η}) \\ = Θ_{1} + k_{1} J^{- 1} (e_{η} + η_{d}) \dot{J} (e_{η} + η_{d}) J^{- 1} (e_{η} \\ + η_{d}) e_{η} - k_{1} J^{- 1} (e_{η} + η_{d}) {\dot{e}}_{η} \end{matrix}$ (9) where $Θ_{1} = J^{- 1} (e_{η} + η_{d}) (\dot{J} (η_{d}) ξ_{d} + J (η_{d}) {\dot{ξ}}_{d} - \dot{J} (e_{η} + η_{d}) J^{- 1} (e_{η} + η_{d}) J (η_{d}) ξ_{d})$ .

Lemma 1. Under the proposed virtual control input ξ_vr and the kinematic model of underactuated AUV (1), e_η is UUB.

Proof 1. Choose the Lyapunv function as follows: $V_{1} = \frac{1}{2} e_{η}^{T} e_{η} .$ (10)

The derivative of (10) is calculated as follows: $\begin{matrix} \dot{V_{1}} & = e_{η}^{T} (\dot{η} - \dot{η_{d}}) \\ = e_{η}^{T} (J (η) ξ - J (η_{d}) ξ_{d}) \\ = e_{η}^{T} (J (η) ξ_{vr} - J (η_{d}) ξ_{d}) \\ = e_{η}^{T} (J (η) J^{- 1} (η) (J (η_{d}) ξ_{d} - k_{1} e_{η}) \\ - J (η_{d}) ξ_{d}) \\ = e_{η}^{T} (- k_{1} e_{η}) \\ = - k_{1} e_{η}^{T} e_{η} \\ = - k_{1} ∥ e_{η} ∥^{2} \\ \leq 0 . \end{matrix}$ (11)

From (11), the time derivative of V₁ is negative semi-definite. It can be concluded that V₁ is bounded and e_η is UUB. This completes the proof.

According to (4)-(9), the dynamic model (2) can be transformed as follows: $\begin{matrix} τ & = M {\dot{e}}_{ξ} - k_{1} {MJ}^{- 1} (e_{η} + η_{d}) {\dot{e}}_{η} + (C (e_{ξ} \\ + ξ_{vr}) + D (e_{ξ} + ξ_{vr})) e_{ξ} + k_{1} ({MJ}^{- 1} (e_{η} \\ + η_{d}) \dot{J} (e_{η} + η_{d}) J^{- 1} (e_{η} + η_{d}) - (C (e_{ξ} \\ + ξ_{vr}) + D (e_{ξ} + ξ_{vr})) J^{- 1} (e_{η} + η_{d})) e_{η} \\ + Θ_{2} + τ_{ɛ} \end{matrix}$ (12) where Θ₂ = MΘ₁ + (C (e_ξ + ξ_vr) + D (e_ξ + ξ_vr)) J^-1 (e_η + η_d) J (η_d) ξ_d + g (e_η + η_d).

Let Λ = [e_ξ e_η] ^T, we can get the error tracking system as follows: $A \dot{Λ} + B Λ + E = Γ$ (13) where $A = [\begin{matrix} M & - k_{1} {MJ}^{- 1} (e_{η} + η_{d}) \\ 0^{6 \times 6} & I^{6 \times 6} \end{matrix}]$ ; $Γ = [\begin{matrix} τ_{e} \\ 0^{6 \times 1} \end{matrix}]$ ; $Θ_{3} = Θ_{2} + τ_{ɛ} - M {\dot{ξ}}_{d} - C (ξ_{d}) ξ_{d} - D (ξ_{d}) ξ_{d} - g (η_{d})$ ; $E = [\begin{matrix} Θ_{3} \\ J (η_{d}) ξ_{d} - J (η) J^{- 1} (η) J (η_{d}) ξ_{d} \end{matrix}]$ ; $B = [\begin{matrix} C (ξ) + D (ξ) & Θ_{4} \\ - J (η) & k_{1} J (η) J^{- 1} (eta) \end{matrix}]$ ; $Θ_{4} = k_{1} ({MJ}^{- 1} (e_{η} + η_{d}) \dot{J} (e_{η} + η_{d}) J^{- 1} (e_{η} + η_{d}) - (C (e_{ξ} + ξ_{vr}) + D (e_{ξ} + ξ_{vr})) J^{- 1} (e_{η} + η_{d}))$ ; τ_e = τ - τ_d and $τ_{d} = M {\dot{ξ}}_{d} + C (ξ_{d}) ξ_{d} + D (ξ_{d}) ξ_{d} + g (η_{d})$ .

3.2 ADP based optimal control design

The performance index function is defined as follows: $V_{2} (Λ (t), Γ) = \int_{t}^{\infty} e^{γ (t - σ)} U (Λ (σ), Γ (σ)) d σ$ (14) where γ is the discount factor and 0 ≤ γ < 1; U (Λ (σ) , Γ (σ)) is the utility function and U (Λ (σ) , Γ (σ)) = Λ^T (σ) QΛ (σ) + Γ^T (σ) RΓ (σ); Q and R are the positive definite matrices.

If the performance index function (14) is continuously differentiable, then the Hamilton-Jacobi-Bellman (HJB) function can be derived as follows: $\begin{matrix} H (Λ, Γ, \nabla V_{2} (Λ)) & = U (Λ, Γ) + (\nabla V_{2} (Λ))^{T} \\ \times (- A^{- 1} (B Λ + E \\ - Γ)) - γ V_{2} (Λ) \\ = 0 \end{matrix}$ (15) where $\nabla V_{2} (Λ) = \nabla V_{2} (Λ (t), Γ) = \frac{\partial V_{2}}{\partial Λ}$ and V₂ (0) =0.

According to the Bellman’s optimality principle, the optimal performance index function can be represented as follows: $V_{2}^{*} (Λ) = min_{Γ \in Ω_{Γ}} \int_{t}^{\infty} e^{γ (t - σ)} U (Λ (σ), Γ (σ)) d σ .$ (16)

The optimal performance index function (16) satisfies the HJB function (15) as follows: $min_{Γ} H (Λ, Γ, \nabla V_{2}^{*} (Λ)) = 0 .$ (17)

The optimal control law is $Γ^{*} (Λ) = - \frac{1}{2} R^{- 1} (A^{- 1})^{T} \nabla V_{2}^{*} (Λ) .$ (18)

Considering the optimal law (18), the HJB function can be transformed as follows: $\begin{matrix} 0 & = U (Λ, Γ^{*}) + (\nabla V_{2}^{*} (Λ))^{T} (- A^{- 1} (B Λ \\ + E - Γ^{*})) - γ V_{2}^{*} (Λ) \\ = Λ^{T} Q Λ - \frac{1}{4} (\nabla V_{2}^{*} (Λ))^{T} A^{- 1} R^{- 1} \\ \times (A^{- 1})^{T} \nabla V_{2}^{*} (Λ) + (\nabla V_{2}^{*} (Λ))^{T} \\ \times (- A^{- 1} (B Λ + E) - γ V_{2}^{*} (Λ) \end{matrix}$ (19) where $V_{2}^{*} (0) = 0$ .

Remark 1. In (13), Γ is taken as the control law with unknown dynamics. So, an action-critic networks based ADP is employed. The critic network is designed to estimate the optimal performance index function (16) and the action network is designed to estimate the optimal control law (18).

3.3 Critic network design

The critic network is designed to approximate $V_{2}^{*} (Λ)$ as follows: $V_{2}^{*} (Λ) = W_{1}^{T} ϖ_{1} (Λ) + δ_{1} (Λ)$ (20) where W₁ is the unknown ideal constant weight of critic network; ϖ₁ (Λ) is the activation function vector of critic network; and δ₁ (Λ) is the approximate error of critic network.

The derivative of equation (20) is represented as follows: $\nabla V_{2}^{*} (Λ) = \nabla ϖ_{1}^{T} (Λ) W_{1} + \nabla δ_{1} (Λ)$ (21) where $\nabla ϖ_{1} (Λ) = \frac{\partial ϖ_{1} (Λ)}{\partial Λ}$ and $\nabla δ_{1} (Λ) = \frac{\partial δ_{1} (Λ)}{\partial Λ}$ .

Let ${\hat{W}}_{1}$ be the approximation of W₁, the approximation of $V_{2}^{*} (Λ)$ can be represented as follows: ${\hat{V}}_{2} (Λ) = {\hat{W}}_{1}^{T} ϖ_{1} (Λ) .$ (22)

Then, the HJB function can be derived as follows: $\begin{matrix} H (Λ, \hat{Γ}, {\hat{W}}_{1}) & = {\hat{W}}_{1}^{T} \nabla ϖ_{1} (Λ) (- A^{- 1} (B Λ \\ + E - \hat{Γ})) + U (Λ, \hat{Γ}) \\ - γ {\hat{W}}_{1}^{T} ϖ_{1} (Λ) \\ = e_{w_{1}} . \end{matrix}$ (23)

The square residual error $e_{1} ({\hat{W}}_{1})$ is defined as follows: $e_{1} ({\hat{W}}_{1}) = \frac{1}{2} e_{w_{1}}^{T} e_{w_{1}} .$ (24)

Given any admissible control law $\hat{Γ}$ , it is desired to select ${\hat{W}}_{1}$ to minimize $e_{1} ({\hat{W}}_{1})$ . The weight update law based on the gradient descent algorithm is given as follows: ${\dot{\hat{W}}}_{1} = - α_{1} ϱ_{1} (ϱ_{1}^{T} {\hat{W}}_{1} + Λ^{T} Q Λ + {\hat{Γ}}^{T} R \hat{Γ})$ (25) where α₁ is the adaptive gain of critic network and α₁ > 0; $ϱ_{1} = \frac{ϱ}{ϱ^{T} ϱ + 1}$ and $ϱ = \nabla ϖ_{1} (- A^{- 1} (B Λ + E - \hat{Γ}) - γ ϖ_{1} (Λ)$ .

According to the definition of ϱ₁, there exist positive constants ϱ_1M > 1 and ϱ_1m > 0 such that ϱ_1m ≤ ∥ ϱ₁ ∥ ≤ ϱ_1M.

(25) can be transformed as follows: $\begin{matrix} {\dot{\tilde{W}}}_{1} & = {\dot{\hat{W}}}_{1} - {\dot{W}}_{1} \\ = - α_{1} ϱ_{1} (ϱ_{1}^{T} {\tilde{W}}_{1} - \nabla δ_{1} (Λ) (- A^{- 1} (B Λ \\ + E - \hat{Γ}))) \\ = - α_{1} ϱ_{1} (ϱ_{1}^{T} {\tilde{W}}_{1} + ɛ_{w 1}) \end{matrix}$ (26) where $ɛ_{w 1} = - \nabla δ_{1} (Λ) (- A^{- 1} (B Λ + E - \hat{Γ}))$ .

3.4 Action network design

The action network is designed to approximate Γ^* as follows: $Γ^{*} = W_{2}^{T} ϖ_{2} (Λ) + δ_{2} (Λ)$ (27) where W₂ is the unknown ideal constant weight of action network; ϖ₂ (Λ) is the activation function vector of action network; δ₂ (Λ) is the approximation error of action network.

Let ${\hat{W}}_{2}$ be the approximation of W₂, the actual output can be expressed as follows: $\hat{Γ} = {\hat{W}}_{2}^{T} ϖ_{2} (Λ) .$ (28)

According to (18) and (28), the feedback error is defined as follows: $e_{w_{2}} = {\hat{W}}_{2}^{T} ϖ_{2} + \frac{1}{2} R^{- 1} (A^{- 1})^{T} \nabla ϖ_{1}^{T} (Λ) {\hat{W}}_{1} .$ (29)

The square residual error $e_{2} ({\hat{W}}_{2})$ is defined as follows: $e_{2} ({\hat{W}}_{2}) = \frac{1}{2} e_{w_{2}}^{T} e_{w_{2}} .$ (30)

It is desired to select ${\hat{W}}_{2}$ to minimize the objective function. The weight update law based on the gradient descent algorithm is given as follows: $\begin{matrix} {\dot{\hat{W}}}_{2} & = - α_{2} ϖ_{2} (Λ) (ϖ_{2}^{T} (Λ) {\hat{W}}_{2} \\ + \frac{1}{2} R^{- 1} A^{- 1} \nabla ϖ_{1}^{T} (Λ) {\hat{W}}_{1})^{T} \end{matrix}$ (31) where α₂ is the adaptive gain of action network and α₂ > 0.

The weight update law (31) can be rewritten as follows: $\begin{matrix} {\dot{\tilde{W}}}_{2} & = {\dot{\hat{W}}}_{2} - {\dot{W}}_{2} \\ = - α_{2} ϖ_{2} (Λ) (ϖ_{2}^{T} (Λ) {\tilde{W}}_{2} + R^{- 1} A^{- 1} \frac{\nabla δ_{1} (Λ)}{2} \\ + \frac{1}{2} R^{- 1} A^{- 1} \nabla ϖ_{1}^{T} (Λ) {\tilde{W}}_{1} - δ_{2} (Λ)) \\ = - α_{2} ϖ_{2} (Λ) (ϖ_{2} (Λ)^{T} {\tilde{W}}_{2} + ɛ_{w 2}) \\ + \frac{1}{2} R^{- 1} A^{- 1} \nabla ϖ_{1}^{T} (Λ) {\tilde{W}}_{1} \end{matrix}$ (32) where $ɛ_{w 2} = - (δ_{2} (Λ) - R^{- 1} A^{- 1} \frac{\nabla δ_{1} (Λ)}{2})$ .

3.5 Stability analysis

According to the optimal control $\hat{Γ}$ (28) and the error tracking system (13), we have $A \dot{Λ} + B Λ + E = {\hat{W}}_{2}^{T} ϖ_{2} (Λ) .$ (33)

The error tracking system (33) can be transformed using equation (27) as follows: $A \dot{Λ} + B Λ + E = {\tilde{W}}_{2}^{T} ϖ_{2} (Λ) + Γ^{*} - δ_{2} (Λ) .$ (34)

Assumption 1 The unknown ideal constant weights W₁ and W₂ satisfy that ∥W₁ ∥ ≤ W_1M and ∥W₂ ∥ ≤ W_2M respectively. W_1M and W_2M are the positive constants.

Assumption 2 The approximation errors of the networks δ₁ and δ₂ satisfy that δ_1m ≤ ∥ δ₁ ∥ ≤ δ_1M and δ_2m ≤ ∥ δ₂ ∥ ≤ δ_2M respectively. δ_1m, δ_1M, δ_2m and δ_2M are the positive constants.

Assumption 3 The the activation function vectors ϖ₁ and ϖ₂ satisfy that ϖ_1m ≤ ∥ ϖ₁ ∥ ≤ ϖ_1M and ϖ_2m ≤ ∥ ϖ₂ ∥ ≤ ϖ_2M respectively. ϖ_1m, ϖ_1M, ϖ_2m and ϖ_2M are the positive constants.

Assumption 4 The gradient of activation function vector of critic network ∇ϖ₁ satisfies that ${\dot{ϖ}}_{1 m} \leq ∥ \nabla ϖ_{1} ∥ \leq {\dot{ϖ}}_{1 M}$ . ${\dot{ϖ}}_{1 m}$ and ${\dot{ϖ}}_{1 M}$ are the positive constants.

Theorem 5. For the error tracking system (13), Assumptions 3.5-3.5 hold. The optimal cost function and the optimal control law are provided by (16) and (18). The weight updating laws of the critic neural network and the action neural network are given by (25) and (31). Then the tracking error Λ and the weight estimate error ${\tilde{W}}_{1}$ and ${\tilde{W}}_{2}$ are asymptotically stable.

Proof 2. Let us choose a Lyapunov function candidate composed of three positive terms $L (t) = L_{1} (t) + L_{2} (t) + L_{3} (t)$ (35) where $L_{1} (t) = \frac{1}{2 α_{1}} tr {{\tilde{W}}_{1}^{T} {\tilde{W}}_{1}}$ ; $L_{2} (t) = \frac{1}{2 α_{2}} tr {{\tilde{W}}_{2}^{T} {\tilde{W}}_{2}}$ ; L₃ (t) = Λ^TΛ + V₂ (Λ, Γ)).

The derivative of the Lyapunov function candidate (35) along the trajectories of the error tracking system (34) is given as follows: $\dot{L} (t) = {\dot{L}}_{1} (t) + {\dot{L}}_{2} (t) + {\dot{L}}_{3} (t) .$ (36)

According to (26), ${\dot{L}}_{1}$ can be represented as follows: $\begin{matrix} {\dot{L}}_{1} (t) & = \frac{1}{α_{1}} tr {{\tilde{W}}_{1}^{T} {\dot{\tilde{W}}}_{1}} \\ = \frac{1}{α_{1}} tr {{\tilde{W}}_{1}^{T} (- α_{1} ϱ_{1} (ϱ_{1}^{T} {\tilde{W}}_{1} + ɛ_{w 1}))} \\ = \frac{1}{α_{1}} tr {- α_{1} ({\tilde{W}}_{1}^{T} ϱ_{1} ϱ_{1}^{T} {\tilde{W}}_{1} + {\tilde{W}}_{1}^{T} ϱ_{1} ɛ_{w 1})} \\ \leq - \frac{3}{4} ∥ ϱ_{1} ∥^{2} ∥ {\tilde{W}}_{1} ∥^{2} + ∥ ɛ_{w 1} ∥^{2} \\ \leq - \frac{3}{4} ϱ_{1 m}^{2} ∥ {\tilde{W}}_{1} ∥^{2} + ∥ ɛ_{w 1} ∥^{2} . \end{matrix}$ (37)

Hence, ${\dot{L}}_{1} < 0$ if $∥ {\tilde{W}}_{1} ∥ > \frac{2 ∥ ɛ_{w 1} ∥}{\sqrt{3} ϱ_{1 m}}$ .

According to equation (32), ${\dot{L}}_{2}$ can be represented as follows: $\begin{matrix} {\dot{L}}_{2} (t) & = \frac{1}{α_{2}} tr {{\tilde{W}}_{2}^{T} {\dot{\tilde{W}}}_{2}} \\ = \frac{1}{α_{2}} tr {{\tilde{W}}_{2}^{T} (- α_{2} ϖ_{2} (Λ) ϖ_{2} (Λ)^{T} {\tilde{W}}_{2} \\ + \frac{1}{2} R^{- 1} A^{- 1} \nabla ϖ_{1}^{T} (Λ) {\tilde{W}}_{1} + ɛ_{w 2}))} \\ = - tr {{\tilde{W}}_{2}^{T} ϖ_{2} (Λ) ϖ_{2}^{T} (Λ) {\tilde{W}}_{2} \\ + \frac{1}{2} {\tilde{W}}_{2}^{T} R^{- 1} A^{- 1} \nabla ϖ_{1}^{T} (Λ) {\tilde{W}}_{1} \\ + {\tilde{W}}_{2}^{T} ɛ_{w 2}} \\ \leq - ∥ ϖ_{2} ∥^{2} ∥ {\tilde{W}}_{2} ∥^{2} + 2 ∥ {\tilde{W}}_{2} ∥^{2} \\ + ∥ R ∥^{- 2} ∥ A ∥^{- 2} ∥ \nabla ϖ_{1} (Λ) ∥^{2} ∥ {\tilde{W}}_{1} ∥^{2} \\ + \frac{1}{4} ∥ ɛ_{w 2} ∥^{2} \\ \leq - (ϖ_{2 m}^{2} - 2) ∥ {\tilde{W}}_{2} ∥^{2} + \frac{1}{4} ∥ ɛ_{w 2} ∥^{2} \\ + {\dot{ϖ}}_{1 M}^{2} ∥ ∥ R ∥^{- 2} A ∥^{- 2} ∥ {\tilde{W}}_{1} ∥^{2} . \end{matrix}$ (38)

Hence, ${\dot{L}}_{2} < 0$ if $∥ {\tilde{W}}_{2} ∥ > \sqrt{\frac{Θ_{5}}{ϖ_{2 m}^{2} - 2}}$ , $Θ_{5} = {\dot{ϖ}}_{1 M}^{2} ∥ R ∥^{- 2} ∥ [M (Z)] ∥^{- 2} ∥ {\tilde{W}}_{1} ∥^{2} + \frac{1}{4} ∥ ɛ_{w 2} ∥^{2}$ and $ϖ_{2 m} > \sqrt{2}$ .

${\dot{L}}_{3}$ is given as follows: $\begin{matrix} {\dot{L}}_{3} (t) & = Λ^{T} \dot{Λ} - Λ^{T} Q Λ - Γ^{* T} R Γ^{*} \\ = Λ^{T} (A^{- 1} (- B Λ - E + {\tilde{W}}_{2}^{T} ϖ_{2} (Λ) \\ + Γ^{*} - δ_{2} (Λ)) - Λ^{T} Q Λ \\ - Γ^{* T} R Γ^{*} \\ = - Λ^{T} A^{- 1} B Λ - Λ^{T} A^{- 1} E \\ + Λ^{T} A^{- 1} {\tilde{W}}_{2}^{T} ϖ_{2} (Λ) + Λ^{T} A^{- 1} Γ^{*} \\ - Λ^{T} A^{- 1} δ_{2} - Λ^{T} Q Λ - Γ^{* T} R Γ^{*} \\ \leq - ∥ A ∥^{- 1} ∥ B ∥ ∥ Λ ∥^{2} + 4 ∥ Λ ∥^{2} + \frac{1}{4} ∥ A ∥^{- 2} ∥ E ∥^{2} \\ + \frac{1}{4} ϖ_{2 M}^{2} ∥ A ∥^{- 2} ∥ {\tilde{W}}_{2} ∥^{2} + \frac{1}{4} ∥ A ∥^{- 2} ∥ Γ^{*} ∥^{2} \\ + \frac{1}{4} δ_{2 M}^{2} ∥ A ∥^{- 2} - λ_{\min} (Q) ∥ Λ ∥^{2} \\ - λ_{\min} (R) ∥ Γ^{*} ∥^{2} \\ = - (∥ A ∥^{- 1} ∥ B ∥ + λ_{\min} (Q) - 4) ∥ Λ ∥^{2} \\ - (λ_{\min} (R) - \frac{1}{4} ∥ A ∥^{- 2}) ∥ Γ^{*} ∥^{2} + \frac{1}{4} ∥ A ∥^{- 2} ∥ E ∥^{2} \\ + \frac{1}{4} ϖ_{2 M}^{2} ∥ A ∥^{- 2} ∥ {\tilde{W}}_{2} ∥^{2} + \frac{1}{4} δ_{2 M}^{2} ∥ A ∥^{- 2} . \end{matrix}$ (39)

Hence, ${\dot{L}}_{3} < 0$ if λ_min (Q)>4 - ∥ A ∥ ^-1 ∥ B ∥, $λ_{\min} (R) > \frac{1}{4} ∥ A ∥^{- 2}$ and $∥ Λ ∥ > \sqrt{\frac{\frac{1}{4} ∥ A ∥^{- 2} ∥ E ∥^{2} + Θ_{6}}{∥ A ∥^{- 1} ∥ B ∥ + λ_{\min} (Q) - 4}}$ , $Θ_{6} = \frac{1}{4} ϖ_{2 M}^{2} ∥ A ∥^{- 2} ∥ {\tilde{W}}_{2} ∥^{2} + \frac{1}{4} δ_{2 M}^{2} ∥ A ∥^{- 2}$ .

Then, we can get $\begin{matrix} \dot{L} (t) & = {\dot{L}}_{1} + {\dot{L}}_{2} + {\dot{L}}_{3} \\ < 0 . \end{matrix}$ (40)

Therefore, it can be concluded that the tracking error Λ and the neural network estimation error ${\tilde{W}}_{1}$ and ${\tilde{W}}_{2}$ are UUB. This completes the proof.

4 Simulation

The model parameters values of underactuated AUV is given in [2]. The numerical values of the parameters used in the simulations are given that $τ_{d}^{'} = [500, 0, 0, 100, 10]^{T}$ , γ = 0.9, α₁ = 0.01, α₂ = 0.01, and k₁ = 1. In order to verify the effectiveness of the proposed control technique, two simulation examples without unknown dynamics and with unknown dynamics are performed compared with the traditional ADP scheme without backstepping method respectively.

4.1 Example one without unknown dynamics

In this section, the simulation results of the underactuated AUV without unknown dynamics are shown in Figs. 3–9.

Fig. 3

e_ξ with method proposed in this work.

Fig. 4

e_ξ with ADP without backstepping method.

Fig. 5

e_η with method proposed in this work.

Fig. 6

e_η with ADP without backstepping method.

Fig. 7

Tracking of desired velocity.

Fig. 8

Tracking of desired position.

Fig. 9

Tracking of desired trajectory.

Figures 3 and 4 show e_ξ of underactuated AUV. The initial value of e_ξ is [0.25, 0, 0, 0.01, 0.01] ^T. Figures 5 and 6 show e_η of underactuated AUV. The initial value of e_η is [0.01, 0, 0, - 0.01, - 0.01] ^T. The convergence time of e_ξ and e_η with the proposed method in this paper is shown in Table 2. From above simulation results, we know that e_ξ and e_η converge to 0 with the proposed method in this paper, and e_ξ and e_η do not to 0 with ADP scheme without backstepping method due to the actuator saturation. In addition, the convergence velocity of e_ξ and e_η is faster.

Table 2

Convergence time of tracking error

Tracking error	The proposed method(s)
errorofu /e_ξ (1)	12
errorofv /e_ξ (2)	27
errorofw /e_ξ (3)	38
errorofq /e_ξ (4)	37
errorofr /e_ξ (5)	26
errorofx /e_η (1)	19
errorofy /e_η (2)	29
errorofz /e_η (3)	36
errorofθ /e_η (4)	25
errorofψ /e_η (5)	23

The tracking of desired velocity of underactuated AUV with two different control schemes is given in Figs. 7 and 8 shows the tracking of desired position of underactuated AUV with two different control schemes. Figure 9 illustrates the spatial trajectories of unnderactuated AUV with different control schemes. From above simulation results, the proposed method in this work has better trajectory-trackingaccuracy.

4.2 Example two with unknown dynamics

In this section the unknown dynamics is considered.

Figures 10 and 11 show e_ξ of underactuated AUV. The initial value of e_ξ is [0.25, 0, 0, 0.01, 0.01] ^T. Figures 12 and 13 show e_η of underactuated AUV. The initial value of e_η is [0.01, 0, 0, - 0.01, - 0.01] ^T. From Figs. 10 and 11, we can see that e_ξ and e_η converge to 0 with the method proposed in this paper. Due to the unknown dynamics, there exists small bounded chatter in the process of convergence. The convergence time of tracking error with the proposed method in this paper is shown in Table 3. Compared with example one, the convergence velocity is a litter slower due to the unknown dynamics. The time response of the underactuated AUV clearly shows that the proposed control method guarantees a higher-level performance.

Fig. 10

e_ξ with method proposed in this work.

Fig. 11

e_ξ with ADP without bacstepping method.

Fig. 12

e_η with method proposed in this paper.

Fig. 13

e_η with ADP without backstepping method.

Table 3

Convergence time of tracking error

Tracking error	The proposed method(s)
errorofu /e_ξ (1)	12
errorofv /e_ξ (2)	20
errorofw /e_ξ (3)	40
errorofq /e_ξ (4)	40
errorofr /e_ξ (5)	40
errorofx /e_η (1)	19
errorofy /e_η (2)	30
errorofz /e_η (3)	52
errorofθ /e_η (4)	55
errorofψ /e_η (5)	50

The tracking of desired velocity of underactuated AUV with two different control schemes is given in Fig. 14. Figure 15 shows the tracking of desired position of underactuated AUV with two different control schemes. Figure 16 illustrates the spatial trajectories of unnderactuated AUV with different control schems. From above simulation results, the proposed method in this work has better trajectory-tracking accuracy.

Fig. 14

Tracking of desired velocity.

Fig. 15

Tracking of desired position.

Fig. 16

Tracking of desired trajectory.

5 Conclusions

The stability of error tracking system (34) is guaranteed based on the Lyapunov stability theorem. The simulation results have shown excellent convergence of the error tracking systems (34) compared with the ADP scheme without backstepping method. The proposed control scheme achieves good tracking performance.

At the same time, the proposed method lacks the consideration of uncertainties limitations. Future researches will concentrate on improving tracking accuracy and stability for cooperative tracking control problems of multiple AUVs with uncertainties. In addition, deep reinforcement learning will be taken into account in future study.

Declarations

Funding:There is no funding to support this work.

Conflicts of interest:The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this manuscript.

Authors’ contributions:G. Che designs the control method, does the simulation experiments and writes the manuscript and Z. Yu designs the control method and analyzes the stability of system.

References

Che

, Liu

and Yu

, An improved ant colony optimization algorithm based on particle swarm optimization algorithm for path planning of autonomous underwater vehicle, J Ambient Intell Human Comput 11 (2020), 3349–3354.

Che

, Liu

and Yu

, Nonlinear trajectory-tracking control for autonomousunderwater vehicle based on iterative dynamic programming, J Intell Fuzzy Syst 37 (2019), 4205–4215.

Che

and Hu

, Optimal trajectory-tracking control for underactuated AUV with unknown disturbances via single network based adaptive dynamic programming, J Ambient Intell Human Comput (2022), https://doi.org/10.1007/s12652-022-04435-2.

Liu

, Zhang

, Chen

and Yin

, Trajectory tracking with quaternion-based attitude representation for autonomous underwater vehicle based on terminal sliding mode control, Appl Ocean Res (2020), https://doi.org/10.1016/j.apor.2020.102342.

Sun

, Zong

, Cui

and Shi

, Fixed-time sliding mode output feedback tracking control for autonomous underwater vehicle with prescribed performance constraint, Ocean Eng (2022), https://doi.org/10.1016/j.oceaneng.2022.110673.

Liu

, Zong

and Wang

, Adaptive region tracking control with prescribed transient performance for autonomous underwater vehicle with thruster fault, Ocean Eng (2019), https://doi.org/10.1016/j.oceaneng.2019.106804.

Che

and Yu

, Neural-network estimators based fault-tolerant tracking control for auv via ADP with rudders faults and ocean current disturbance, Neurocomputing 411 (2020), 442–454.

Kadiyam

, Parashar

, Mohan

and Deshmukh

, Actuator faulttolerant control study of an underwater robot with four rotatable thruster, Ocean Eng (2020), https://doi.org/10.1016/j.oceaneng.2020.106929.

von Ellenrieder

K.D.

, Dynamic surface control of trajectory trakcing marine vehicles with actuator magnitude and rate limits, Automatica 105 (2019), 433–442.

10.

Che

, Single critic network based fault-tolerant tracking control for underactuated AUV with actuator fault, Ocean Eng (2022), https://doi.org/10.1016/j.oceaneng.2022.111380.

11.

Xia

, Sun

, Zhao

, Sun

and Xia

, Robust cooperative trajectory tracking control for an unactuated floating object with multiple vessels system, ISA Trans 123 (2022), 263–271.

12.

Ling

, Wang

and Liu

P.X.

, Adaptive tracking control of high-oder nonlinear systems under asymmetric output constraint, Automatica (2020), https://doi.org/10.1016/j.automatica.2020.109281.

13.

, Wang

and Liu

P.X.

, Adapitve fuzzy fininte-time tracking control of nonlinear systems with unmodeled dynamics, Appl Math Comput (2023), https://doi.org/10.1016/j.amc.2023.127992.

14.

Miao

, Li

and Luo

, A DSC and MLP based robust adaptive nn tracking control for underwater vehicle, Neurocomputing 111 (2013), 184–89.

15.

Miao

, Li

and Luo

, Multi Pseudo Q-learning-based deterministic policy gradient for tracking control of autonomous underwater vehicle, IEEE Trans Neural Netw Learn Syst 30 (2019), 3524–3546.

16.

Dai

S.-L.

, He

, Wang

and Yuan

, Adaptive neural control of un?deractuated surface vessels with prescribed performance guarantees, IEEE Trans Neural Netw Learn Syst 30 (2019), 3686–3698.

17.

, Li

, Gao

, Shan

, Chen

C.L.P.

and Xiao

, Adaptive NN envent-triggered control for path following of underactuated vessels with finite-time convergence, Neurocomputing 379 (2020), 203–213.

18.

Wang

, Liu

and Li

, Command filter based golbally stable adaptive neural control for cooperative path following of multiple underactuated autonomous underwater vehicles with partial knowledge of the referencenspeed, Neurocomputing 275 (2018), 1478–1489.

19.

Guo

, Qin

, Xu

, Han

, Fan

Q.-Y.

and Zhang

, Composite learn?ing adaptive sliding mode control for auv target tracking, Neurocomputing 351 (2019), 1480–1486.

20.

Liu

Y.-C.

, Liu

S.-Y.

and Wang

, Fully-tuned fuzzy neural network based robust adaptive tracking control of unmanned underwater vehicle with thruster dynamics, Neurocomputing 196 (2016), 1–13.

21.

, Xiao

and Sam

, Ge and H. Su, Constrained multiegged robot system modeling and fuzzy control with uncertain kinematics and dynamics incorporating foot force optimization, IEEE Trans Syst Man Cy-S 46 (2016), 1–15.

22.

Liu

Y.-J.

and Tong

, Optimal control-based adaptive NN design for a class of nonlinear discrete-time block triangular systems, IEEE Trans Cybernetics 46 (2016), 2670–2680.

23.

Cui

, Yang

, Li

and Sharma

, Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning, IEEE Trans Syst Man Cybern Syst 47 (2017), 1019–1029.

24.

Song

, Lewis

F.L.

, Wei

and Zhang

, Off-policy actor-critic structure for optimal control of unknown system with disturbances, IEEE Trans Cybernetics 46 (2016), 1041–1050.

25.

Zhang

, Zhang

, Yang

G.-H.

and Luo

, Leader-based optimal coor?dination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming, IEEE Trans Fuzzy Syst 23 (2014), 152–163.

26.

Song

, Xiao

, Zhang

and Sun

, Adaptive dynamic programming for a class of complex-valued nonlinear systems, IEEE Trans Neural Netw Learn Syst 25 (2014), 1733–1739.

27.

Liu

and Wei

, Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems, IEEE Trans. Neural Netw Learn Syst 25 (2014), 621–634.

28.

, Wang

and He

, Novel iterative neural dynamic programming for data-based approximate optimal control design, Automatica 81 (2017), 240–252.

29.

Gao

, Jiang

Z.-P.

and Chai

, Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming, Automatica 72 (2016), 37–45.

30.

Qin

, Zhang

and Luo

, Online optimal tracking control of continuous time linear systems with unkonwn dynamics by using adaptive dynamic programming, Int J Control 87 (2014), 1000–1009.

31.

Lin

, Wei

and Luo

, A novel tracking control scheme for a class of discrete-time nonlinear systems using generalised policy iteration adaptive dynamic programming algorithm, Int J Syst Sci 48 (2017), 525–534.

32.

Healey

A.J.

and Lienard

, Multivariable sliding mode control for autonomous diving and steering of unmanned underwater vehicles, IEEE J Oceanic Eng 18 (1993), 327–339.

Backstepping method tracking control for underactuated AUV with unknown dynamics based on action-critic networks based ADP

Abstract

Keywords

1 Introduction

2 Problem formulation and mathematical model of AUV

4.1 Example one without unknown dynamics

Declarations

References