Abstract
This work investigates trajectory-tacking control problem for underactuated autonomous underwater vehicles (AUV) with unknown dynamics. Due to the unknown dynamics, an action-critic networks based adaptive dynamic programming (ADP) scheme combined with backstepping approach is designed, which can achieve high-level system stability and tracking control accuracy. Firstly, the backstepping approach is introduced into the kinematic model of underactuated AUV and produces a virtual velocity control which is taken as the desired velocity input of the dynamic model of underactuated AUV. Secondly, the error tracking system is constructed according to the dynamic model of underactuated AUV. Thirdly, the critic neural network and the action neural network are employed to transform the trajectory-tracking control problem into optimal control problem based on policy iteration algorithm. At last simulation results are given to verify the effectiveness of the proposed control scheme.
Keywords
Introduction
The motion control of autonomous underwater vehicle(AUV) is still a hot research topic due to its wide applications [1]. Tracking control is one fundamental functionality of motion control for AUV [2, 3]. Many control schemes are developed to solve the trajectory-tracking problem for fully-actuated AUV, such as sliding mode control method [4, 5], adaptive control method [6], observer based adaptive dynamic programming (ADP) [7], backstepping method [8] and dynamic surface control method[9].
However, Some AUV is underactuated and the number of independent control variables of underactuated AUV is less than the number of degrees of freedom (DOF) of underactuated AUV. The underactuated AUV is a complex system [10]. The unknown dynamics could reduce the tracking accuracy and system stability of underactuated AUV, which adds more difficulties in the process of trajectory-tracking control. These difficulties serve as the motivation of this work.
Traditionally, the tracking control problems of underactuated AUV have been solved through a variety of control technologies. The robust control method [11] and backstepping method [12, 13] can deal with high-order nonlinear systems. However, the backstepping method depends on the accurate model of AUV. And the robust control method can cause the chattering phenomenon. The adaptive neural control method does not depend on the accurate model of AUV. The neural network is introduced to solve the trajectory-tracking problem for AUV due to its strong nonlinear fitting ability, strong robustness and strong self-learning ability. The adaptive neural network tracking controller [14] constructed with the radial basic function neural network is developed for underwater vehicle with unknown nonlinear function. The deterministic policy gradient [15] using multi Pseudo Q-learning method is proposed to reduce the overestimation of action-value function for underactuated AUV with unknown dynamics and constrained inputs. The neural networks are employed to approximate the uncertain underactuated vessel dynamics and external disturbances [16]. The radial basis function neural network is employed to deal with the uncertain nonlinear dynamics for the underactuated vessel [17]. The neural network is designed to solve the cooperative path following problem for a fleet of underactuated AUV with uncertain nonlinear dynamics [18]. The neural network is employed to estimate the unknown nonlinear dynamics of AUV [19]. Like the adaptive neural control method, fuzzy control method does not know the accurate model of AUV. However, the the fuzzy control method strongly depend on the prior knowledge. A robust adaptive control scheme based on fully-tuned fuzzy neural network is proposed to solve the trajectory and attitude problem for unmanned underwater vehicle with thruster dynamics and unknown disturbances [20]. Sliding mode method can overcome the influence of unknown disturbance of AUV. However, it will happen chatter in practical applications. Although a number of traditional control schemes have given solutions to the tracking control problems for AUV with different situations, it is necessary to develop more powerful control schemes to meet higher requirements for control performance of AUV. Some researchers have began to focus on two-player zero-sum game, which has received great development in recent years.
However, the mentioned researches usually employed state observer to estimate system state due to the unknown disturbances or uncertainties, which may accumulate the error. Therefore, it is necessary to seek one optimal control scheme for underactuated AUV, which can minimize the performance index function, and also can reduce the effect of unknown dynamics. Unfortunately, there is a few researches that focus on coping with adaptive dynamic programming (ADP) based trajectory-tracking control for underactuated AUV with unknown dynamics. As a powerful optimization tool, ADP scheme has received more attention [21]. ADP scheme has been introduced to compensate for unknown dynamics, such as control input nonlinearities and model uncertainties for nonlinear systems [22, 23]. Policy iteration [24–27] and value iteration [28, 29] are two primary iteration ADP algorithms. The main contributions of this paper can be stated as below: The virtual velocity control input is designed using the backstepping method, which is taken as the reference velocity input of the dynamic model of underactuated AUV. The unknown dynamics [21, 30] is considered in this work. Then, the action-critic networks based ADP is employed to transform trajectory-tracking control problem into optimal control problem. The online policy iteration algorithm and weight update laws are designed. In order to verify the effectiveness of the method proposed in this paper, the simulation results are given with the compared method proposed in [31].
This work is organized as follows. In Section 2, The kinematic model and the dynamic model of underactuated AUV based on the reference model of AUV in [32] are given. In Section 3, ADP optimal controller is designed via backstepping method. In Section 4, two simulation examples are provided. Section 5 gives the conclusions.
Problem formulation and mathematical model of AUV
The mathematica model of underactuated AUV is shown in Fig. 1. Two coordinate systems are used that one is the universal frame {O e - X e Y e Z e } and the other is the body-fixed frame {O b - X b Y b Z b }.

Underactuated AUV model system.
Notation of motion for underactuated AUV
The kinematic model is given as follows:
The dynamic model of underactuated AUV is established as follows:
The dynamic model (2) can be rewritten as follows:
The position error is defined as follows:
The time derivative of (4) is given as follows:
The velocity error is defined as follows:
The time derivative of (6) is given as follows:
The block diagram of the proposed control scheme is show in Fig. 2.

Underactuated AUV controller block diagram.
Considering the kinematic model of underactuated AUV, we make e
η
= 0 as t→ ∞. According to (4), (5) and (6), we can get the virtual velocity control input ξ
vr
, which is given as follows:
The time derivative of (8) is given as follows:
The derivative of (10) is calculated as follows:
From (11), the time derivative of V1 is negative semi-definite. It can be concluded that V1 is bounded and e η is UUB. This completes the proof.
According to (4)-(9), the dynamic model (2) can be transformed as follows:
Let Λ = [e
ξ
e
η
]
T
, we can get the error tracking system as follows:
The performance index function is defined as follows:
If the performance index function (14) is continuously differentiable, then the Hamilton-Jacobi-Bellman (HJB) function can be derived as follows:
According to the Bellman’s optimality principle, the optimal performance index function can be represented as follows:
The optimal performance index function (16) satisfies the HJB function (15) as follows:
The optimal control law is
Considering the optimal law (18), the HJB function can be transformed as follows:
The critic network is designed to approximate
The derivative of equation (20) is represented as follows:
Let
Then, the HJB function can be derived as follows:
The square residual error
Given any admissible control law
According to the definition of ϱ1, there exist positive constants ϱ1M > 1 and ϱ1m > 0 such that ϱ1m ≤ ∥ ϱ1 ∥ ≤ ϱ1M.
(25) can be transformed as follows:
The action network is designed to approximate Γ* as follows:
Let
According to (18) and (28), the feedback error is defined as follows:
The square residual error
It is desired to select
The weight update law (31) can be rewritten as follows:
According to the optimal control
The error tracking system (33) can be transformed using equation (27) as follows:
The derivative of the Lyapunov function candidate (35) along the trajectories of the error tracking system (34) is given as follows:
According to (26),
Hence,
According to equation (32),
Hence,
Hence,
Then, we can get
Therefore, it can be concluded that the tracking error Λ and the neural network estimation error
The model parameters values of underactuated AUV is given in [2]. The numerical values of the parameters used in the simulations are given that
Example one without unknown dynamics
In this section, the simulation results of the underactuated AUV without unknown dynamics are shown in Figs. 3–9.

e ξ with method proposed in this work.

e ξ with ADP without backstepping method.

e η with method proposed in this work.

e η with ADP without backstepping method.

Tracking of desired velocity.

Tracking of desired position.

Tracking of desired trajectory.
Figures 3 and 4 show e ξ of underactuated AUV. The initial value of e ξ is [0.25, 0, 0, 0.01, 0.01] T . Figures 5 and 6 show e η of underactuated AUV. The initial value of e η is [0.01, 0, 0, - 0.01, - 0.01] T . The convergence time of e ξ and e η with the proposed method in this paper is shown in Table 2. From above simulation results, we know that e ξ and e η converge to 0 with the proposed method in this paper, and e ξ and e η do not to 0 with ADP scheme without backstepping method due to the actuator saturation. In addition, the convergence velocity of e ξ and e η is faster.
Convergence time of tracking error
The tracking of desired velocity of underactuated AUV with two different control schemes is given in Figs. 7 and 8 shows the tracking of desired position of underactuated AUV with two different control schemes. Figure 9 illustrates the spatial trajectories of unnderactuated AUV with different control schemes. From above simulation results, the proposed method in this work has better trajectory-trackingaccuracy.
In this section the unknown dynamics is considered.
Figures 10 and 11 show e ξ of underactuated AUV. The initial value of e ξ is [0.25, 0, 0, 0.01, 0.01] T . Figures 12 and 13 show e η of underactuated AUV. The initial value of e η is [0.01, 0, 0, - 0.01, - 0.01] T . From Figs. 10 and 11, we can see that e ξ and e η converge to 0 with the method proposed in this paper. Due to the unknown dynamics, there exists small bounded chatter in the process of convergence. The convergence time of tracking error with the proposed method in this paper is shown in Table 3. Compared with example one, the convergence velocity is a litter slower due to the unknown dynamics. The time response of the underactuated AUV clearly shows that the proposed control method guarantees a higher-level performance.

e ξ with method proposed in this work.

e ξ with ADP without bacstepping method.

e η with method proposed in this paper.

e η with ADP without backstepping method.
Convergence time of tracking error
The tracking of desired velocity of underactuated AUV with two different control schemes is given in Fig. 14. Figure 15 shows the tracking of desired position of underactuated AUV with two different control schemes. Figure 16 illustrates the spatial trajectories of unnderactuated AUV with different control schems. From above simulation results, the proposed method in this work has better trajectory-tracking accuracy.

Tracking of desired velocity.

Tracking of desired position.

Tracking of desired trajectory.
The stability of error tracking system (34) is guaranteed based on the Lyapunov stability theorem. The simulation results have shown excellent convergence of the error tracking systems (34) compared with the ADP scheme without backstepping method. The proposed control scheme achieves good tracking performance.
At the same time, the proposed method lacks the consideration of uncertainties limitations. Future researches will concentrate on improving tracking accuracy and stability for cooperative tracking control problems of multiple AUVs with uncertainties. In addition, deep reinforcement learning will be taken into account in future study.
Declarations
Funding:There is no funding to support this work. Conflicts of interest:The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this manuscript. Authors’ contributions:G. Che designs the control method, does the simulation experiments and writes the manuscript and Z. Yu designs the control method and analyzes the stability of system.
