Deep submergence rescue vehicle docking based on parameter adaptive control with acoustic and visual guidance

Abstract

In view of the difficulties in the attitude determination of wrecked submarine and the automatic attitude matching of deep submergence rescue vehicles during the docking and guidance of a submarine rescue vehicle, this study proposes a docking method based on parameter adaptive control with acoustic and visual guidance. This study omits the process of obtaining the information of the wrecked submarine in advance, thus saving considerable detection time and improving rescue efficiency. A parameter adaptive controller based on reinforcement learning is designed. The S-plane and proportional integral derivative controllers are trained through reinforcement learning to obtain the control parameters in the improvement of the environmental adaptability and anti-current ability of deep submarine rescue vehicles. The effectiveness of the proposed method is proved by simulation and pool tests. The comparison experiment shows that the parameter adaptive controller based on reinforcement learning has better control effect, accuracy, and stability than the untrained control method.

Keywords

Deep submergence rescue vehicles underwater docking parameter adaptive control reinforcement learning acoustic and visual guidance

Introduction

Submarines are characterized by good concealment, long range, and strong penetration, these characteristics contribute to making it widely used. However, the crew avoiding danger remains to be a difficult problem because of the particularity of the working environment.^1

–5 Incomplete statistics reveals that more than 400 accidents involving submarines in peaceful environments worldwide have been reported since 1900, resulting in the sinking of more than 180 submarines and the death of more than 3000 sailors.^6,7 The accident of the Russian navy’s “KURSK” nuclear submarine in August 2000 shocked the world and gained people’s attention for research on submarine rescue technology.⁸

After the submarine accident, the crew could escape in many ways. Waiting for a deep submergence rescue vehicle (DSRV) is the most reliable and effective among many methods worldwide.^9

–12

On April 10, 1963, the US navy’s nuclear submarine “Thresher” had an accident during a deep diving test in the Atlantic Ocean, resulting in the death of 129 people.¹³ This event prompted the US navy to propose a deep submersible rescue plan in May 1964. Lockheed Missiles teamed up with the Space Company to build the US DSRV-1 Mystic, the world’s first DSRV, which was launched in 1970. The Avalon was built in 1971, which was roughly the same size and had similar functions as the Mystic. Both vehicles entered service in 1977 and retired in 2000. Russia has two series of DSRVs, namely, Bester and Priz. The Priz series consists of four DSRVs, namely AS-26 (1986), AS-28 (1989), AS-30 (1989), and AS-34 (1991).¹⁴ The LR series were produced by Perry Slingsby systems in British. The LR5 is used by the North Atlantic Treaty Organization for submarine rescue; it has participated in the rescue activities of the Russian “KURSK” nuclear submarine.^15,16

China began its technical research on DSRVs in the 1970s and placed its self-developed DSRV into service in 1987. The DSRV has a maximum dive of 600 m and a maximum speed of 4 knots. It can be used for docking and rescue when the current is less than 1.5 knots, the water visibility is more than 0.5 m, and the submarine is not inclined too much. In 2008, China purchased DSRV LR7 from Britain.

The docking process¹⁷ of DSRVs and wreck submarines is important in deep-dive rescue. The following control method¹⁸ is related to the success of docking. Xia et al.¹⁹ designed an adaptive fuzzy control law using command filtered backstepping method considering the system uncertainties and unknown disturbances of a DSRV. A second-order filter in the design of the controller was introduced to approximate the virtual control signal and its derivative. However, only a simulation test was conducted for verification. Control accuracy and other problems may appear in an actual test. Park et al.²⁰ introduced a docking process for a testbed autonomous underwater vehicle (AUV) using a charge-coupled device camera. A two-stage final approach for stable docking at the terminal was suggested. A vision guidance controller was designed with proportional integral derivative controllers for the vertical and horizontal planes. However, this visual guidance docking process hardly obtains the attitude of the docking device, which is only applicable to the case of small deviation angle of the docking device. Teo et al.²¹ presented an AUV docking method that resists unknown water currents. The approach incorporated a Tagaki Sugeno Kang fuzzy inference system. A current compensator was designed and applied to fuzzy docking guidance so that the vehicle can maintain course under the disturbance of water flow. However, the method fixed the direction of the docking device and was not universally applicable. Li et al.²² designed a docking control algorithm for AUVs. A control scheme of a three-layer control loop structure was proposed and embedded with an online current compensator and effective control parameter setting. The validity of the method was verified through experiments. However, this method had a weak anti-interference ability. When the flow velocity is large, the cross-track error increases, resulting in docking failure.

Therefore, the underwater rescue docking process faced the following problems:

Target identification and positioning efficiency: In the process of guidance, an AUV is often used to determine the position and attitude of the wrecked submarine before rescue and thus time consuming.

Adaptive controller: The position of the submarine wreck, which requires the controller to have strong anti-interference ability and self-adaptability is often accompanied by severe sea conditions.

Control accuracy: High control accuracy is needed to improve the success rate of docking between DSRV and submarine.

A new docking guidance method based on visually assisted underwater acoustics is proposed to address problems in underwater rescue docking process. This method measures the attitude of a submarine combined with the visual and acoustic to adjust the DSRV. Moreover, it omits the process of obtaining the information of the wrecked submarine in advance, thus saving considerable detection time and improving rescue efficiency. The automatic matching problem of submarines with large inclination under unfavorable sea conditions is solved. A parametric adaptive controller based on reinforcement learning is designed in this study. The parameters of the controller are trained through reinforcement learning, which makes the controller adaptive to the parameters; thus, the control effect of the DSRV in the complicated underwater environment and the accuracy and stability of the docking process are improved. In addition, achieving the required angle and accuracy of docking and increasing the complexity of thrust distribution and energy consumption are difficult if the large inclination positioning of a DSRV is achieved through propellers only. A ballast tank combined with a pump valve (solenoid valve) system was used to achieve the vertical and horizontal tilt control of a test DSRV.

The rest of this article is organized as follows: The second section proposes the underwater docking guidance method combining vision and acoustics. The third section designs a parametric adaptive controller based on reinforcement learning, including the DSRV motion control and water tank regulation control algorithms. The fourth section presents the simulation experiment of the docking process. The fifth section discusses the experiment on DSRV docking and compares the results with that of the conventional control algorithm, thereby proving the advantages and effectiveness of the proposed method.

Acoustic and visual guidance during docking

The attitude of the submarine cannot be acquired due to planar imaging if visual guidance is the only method used during docking. Determining the center position of the target due to the uncertain attitude of the target is difficult if the acoustic guidance method is adopted. Visual and acoustic methods are combined to identify and locate objects. First, the position of the docking device is determined by the ultrashort baseline (USBL) positioning system. The target is identified and positioned on the basis of the single shot multibox detector (SSD) algorithm using the camera mounted on the DSRV after the DSRV arrives near the docking device. The target is positioned at the center of the image by adjusting the position of the DSRV. The attitude of the DSRV is adjusted using a water tank so that the slant distance measured by the four USBLs is approximately equal. Second, the position of the DSRV is fine-tuned, placing the target at the center of the image again. Lastly, the DSRV dives, and the position and attitude of the DSRV are adjusted to complete the docking.

Visual guidance

The SSD algorithm²³ is used to identify and locate the docking device. The DSRV is equipped with an underwater searchlight, and a reflector is installed on the docking device to reflect the light, which is identified by the camera.

Deep learning is used to study the real-time detection and positioning of relative moving target points. The SSD method is adopted to extract the convolution features of multiple scales with VGG16 as the network model under the TensorFlow framework.

The image physical coordinates of the target’s center point and four corners are obtained and then converted into the camera coordinate system. The position of the target in the geodetic coordinate system is obtained in accordance with the transformation relation between the camera and geodetic coordinate systems. This target position is the target point of DSRV.

The pinhole model is taken as the imaging model of the camera, as shown in Figure 1, which represents three different coordinate systems; (X_W , Y_W , Z_W ) is the world coordinate system, also known as global coordinates; (xoy) is the camera coordinate system, which takes the focal point of the camera model as the origin, the direction of the captain as the x-axis, direction of the boat width as the y-axis, and the optical axis of the camera as the z-axis. The image coordinate system is divided into image pixel (XOY) and image physical coordinate systems (X_fO_fY_f ). The origin of the image physical coordinate system is the intersection of the optical axis of the lens and the imaging plane. The X- and Y-axes are parallel to the x- and y-axes of the camera coordinate system, respectively. The image pixel coordinate system, also known as computer image coordinate system, is a planar rectangular coordinate system fixed on the image with the unit of pixels and is located in the upper left corner of the image. The X_f - and Y_f -axes are parallel to the X- and Y-axes of the physical coordinate system of the image.

Figure 1.

Schematic representation of camera coordinate system transformation.

The transformation relationships among the coordinate systems are as follows:

The transformation relationship between the world and camera coordinate systems is as follows

[\begin{matrix} x \\ y \\ z \\ 1 \end{matrix}] = [\begin{matrix} R & T \\ 0^{T} & 1 \end{matrix}] \cdot [\begin{matrix} x_{w} \\ y_{w} \\ z_{w} \\ 1 \end{matrix}] = [\begin{matrix} r_{11} & r_{12} & r_{13} & t_{x} \\ r_{21} & r_{22} & r_{23} & t_{y} \\ r_{31} & r_{32} & r_{33} & t_{z} \\ 0 & 0 & 0 & 1 \end{matrix}] \cdot [\begin{matrix} x_{w} \\ y_{w} \\ z_{w} \\ 1 \end{matrix}]

where T is the coordinates of the origin of the world coordinate system in the camera coordinate system and the matrix R is the orthogonal rotation matrix. R meets the following constraints

\begin{array}{l} r_{11}^{2} + r_{12}^{2} + r_{13}^{2} = 1 \\ r_{21}^{2} + r_{22}^{2} + r_{23}^{2} = 1 \\ r_{31}^{2} + r_{32}^{2} + r_{33}^{2} = 1 \end{array}

The transformation relationship between the image and camera coordinate systems is as follows:

The point p in the camera coordinate system is converted to the point P in the image physical coordinate system according to the following transformation relationship

\{\begin{matrix} X = f x / z \\ Y = f y / z \end{matrix}

where f is the camera parameters. The above equation is expressed by homogeneous coordinates as follows

z [\begin{matrix} X \\ Y \\ 1 \end{matrix}] = [\begin{matrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & f & 0 \\ 0 & 0 & 1 & 0 \end{matrix}] [\begin{matrix} x \\ y \\ z \\ 1 \end{matrix}]

The above formula is converted into the image coordinate system

\{\begin{array}{l} u - u_{0} = X / d_{x} = s_{x} X \\ v - v_{0} = Y / d_{y} = s_{y} Y \end{array}

where u and v are the image coordinates, u ₀ and v ₀ are the image center coordinates. $d_{x}, d_{y}$ are the physical dimensions of a pixel in the X- and Y-directions, respectively. $s_{x} = 1 / d_{x}, \begin{array}{l} \end{array} s_{y} = 1 / d_{y}$ are the sampling frequency in X- and Y-directions, respectively, namely, the number of pixels per unit length.

The transformation relationship between the world and image coordinate systems is as follows

\{\begin{array}{l} \frac{X}{f} = \frac{u - u_{0}}{f_{x}} = \frac{r_{11} x_{w} + r_{12} y_{w} + r_{13} z_{w} + t_{x}}{r_{31} x_{w} + r_{32} y_{w} + r_{33} z_{w} + t_{z}} \\ \frac{Y}{f} = \frac{v - v_{0}}{f_{y}} = \frac{r_{21} x_{w} + r_{22} y_{w} + r_{23} z_{w} + t_{y}}{r_{31} x_{w} + r_{32} y_{w} + r_{33} z_{w} + t_{z}} \end{array}

The above equation is converted into homogeneous coordinates.

z [\begin{matrix} u \\ v \\ 1 \end{matrix}] = [\begin{matrix} f_{x} & 0 & u_{0} & 0 \\ 0 & f_{y} & v_{0} & 0 \\ 0 & 0 & 1 & 0 \end{matrix}] [\begin{matrix} R & T \\ 0^{T} & 1 \end{matrix}] [\begin{matrix} x_{w} \\ y_{w} \\ z_{w} \\ 1 \end{matrix}] = M_{1} M_{2} X = M X

This equation is the mathematical expression of the pinhole model. Figure 2 shows that the internal and external parameters of the camera can be solved using several known object points, and the coordinates of the corresponding image points under the condition that the internal parameters of the camera are determined.

Figure 2.

Underwater camera calibration test.

Table 1 lists the camera parameters obtained through the experiment.

Table 1.

Camera parameters.

Parameters	Values
$[\begin{matrix} f_{u} & 0 & u_{0} \\ 0 & y_{v} & v_{0} \\ 0 & 0 & 1 \end{matrix}]$	$[\begin{matrix} 2562.36684 & 0 & 945.82388 \\ 0 & 2565.20335 & 559.17470 \\ 0 & 0 & 1 \end{matrix}]$
$[R T]$	$[\begin{matrix} - 0.842076 & - 0.542677 & - 0.0192836 & 476.845 \\ 0.273430 & - 0.446289 & 0.831895 & 67.311 \\ - 0.464910 & 0.688297 & 0.5241947 & 1190.81 \end{matrix}]$

The position of the target relative to the DSRV in the world coordinate system can be solved in real time according to the distance between the target point and the image center in the imaging position of the camera after obtaining the camera parameters. The movement of the DSRV is controlled until the target is in the imaging center, so that the DSRV and the rescued submarine are aligned in the vertical direction. When the target appears in the imaging center, the distance from the DSRV to the target can be obtained in accordance with the actual size of the target and the imaging size.

Acoustic guidance

After the DSRV and the submarine are aligned in the vertical direction with the visual guidance, the DSRV is guided to the submarine’s attitude with acoustic guidance. Figure 3 shows the installation of four underwater acoustic transponders on the docking device and the installation of the positioning base station on the docking guidance system of the DSRV.

Figure 3.

Installation position of the four transponders on the docking platform: (a) main view and (b) top view.

The target attitude of DSRV, heading angle ψ, trim angle θ, and heeling angle ϕ, are calculated using formula in accordance with the distance of the underwater acoustic transponders and DSRV. Figure 4 shows that point O is the position of the DSRV; (x_w ₁, y_w ₁), (x_w ₂, y_w ₂), (x_w ₃, y_w ₃), and (x_w ₄, y_w ₄) are the positions of the four underwater acoustic transponders

\begin{array}{l} ψ = arctan (y_{w 3} - y_{w 1}) / (x_{w 3} - x_{w 1}) \\ θ = arccos ((2 d^{2} + c^{2} - {(L_{3} - L_{1})}^{2}) / (\sqrt{2} d c)) \\ φ = arccos ((2 d^{2} + b^{2} - {(L_{2} - L_{4})}^{2}) / (\sqrt{2} d b)) \end{array}

Figure 4.

Attitude calculation diagram: (a) main view and (b) side view.

where

c = \sqrt{2 L_{3}^{2} - (L_{3}^{2} + L_{1}^{2} L_{3} - 2 L_{3} d^{2}) / L_{1}}

b = \sqrt{2 L_{2}^{2} - (L_{2}^{2} + L_{4}^{2} L_{2} - 2 L_{2} d^{2}) / L_{4}}

L ₁, L ₂, L ₃, and L ₄ are the distances measured by four underwater acoustic transponders. d is the mounting distance of two adjacent underwater acoustic transponders. The underwater acoustic transponder with the minimum distance measured as no. 1 was set clockwise, and nos. 2, 3, and 4 were set counterclockwise.

The attitude of a lifeboat is highly required during docking. Therefore, testing the positioning accuracy of the USBL is necessary. The docking platform is placed underwater, and the base station of the docking guidance system is suspended at a certain depth in the water for static test. The sound velocity profiler is used to collect the sound velocity information of the tank, and the sound velocity is modified before the test. Table 2 presents the measured standard deviation of the four transponders at 7.825, 7.556, 7.765, and 8.015 m, with a maximum value of 0.002 m. The positioning system keeps the measured values of the four transponders’ oblique distance unchanged, and the error is less than 1 cm, meeting the requirements of docking.

Table 2.

Mean value and measurement error of the transponders.

Number	L (m)	Error (m)	Error / (%)
1	7.556	0.002	0.026
2	7.825	0.001	0.013
3	7.765	0.002	0.026
4	8.015	0.001	0.012

Parameter adaptive controller based on online learning

The development of science and technology has led to an increasing number of control methods being applied in the AUV field.²⁴ The S-plane control method²⁵ is adopted to identify the propeller control of DSRV, and the integral separation proportional–integral–derivative (PID) control method²⁶ is adopted to achieve the water tank adjustment of DSRV. The deep deterministic policy gradient (DDPG) method²⁷ is used to train the control parameters for obtaining the online learning ability of the controller and improve the adaptability of the parameters.

Hydrodynamic modeling

The CHUAN SUO (CS) deep submergence vehicle was taken as the experiment platform (Figure 5).

Figure 5.

CHUAN SUO deep submergence rescue vehicle.

A dynamic model of the DSRV is established^28,29

M \dot{\overset{⇀}{v}} + C (\overset{⇀}{v}) \overset{⇀}{v} + D (\overset{⇀}{v}) \overset{⇀}{v} + g (\overset{⇀}{η}) = τ + g_{0}

where M is the inertia coefficient matrix of the system that can satisfy $Μ = M_{R B} + M_{A} \geq 0$ ; $M_{R B}$ is the inertia matrix of the carrier that can satisfy $M_{R B} = M_{R B}^{T} \geq 0$ and ${\dot{M}}_{R B} = 0$ ;M_A is the added mass coefficient matrix that can satisfy $M_{A} = M_{A}^{T} > 0$ ; $C (\overset{⇀}{v}) = C_{R B} (\overset{⇀}{v}) + C_{A} (\overset{⇀}{v})$ , $C_{A} (\overset{⇀}{v})$ is the Coriolis force coefficient matrix that can satisfy $C_{A} (\overset{⇀}{v}) = - C_{A}^{T} (\overset{⇀}{v})$ ; $D (\vec{v})$ is the viscous hydrodynamic coefficient matrix that can satisfy $D (\overset{⇀}{v}) > 0 \Leftrightarrow \overset{⇀}{x} D (\overset{⇀}{v}) {\overset{⇀}{x}}^{T}, \forall \overset{⇀}{x} \neq 0$ ; $τ$ is the control input vector;g ₀ is the static load vector, which is set to 0 to facilitate the study; and $g (\dot{η})$ is the restoring force/torque vector.

The parameters of each matrix are expressed as follows³⁰

M = diag \{m - X_{\dot{u}}, m - Y_{\dot{v}}, m - Z_{\dot{w}}, I_{x x} - K_{\dot{p}}, I_{y y} - M_{\dot{q}}, I_{z z} - N_{\dot{r}}\}

C (v) = [\begin{matrix} 0_{3 \times 3} & C_{1}^{} \\ - C_{1}^{T} & C_{2} \end{matrix}]

where

C_{1} = [- \begin{matrix} 0 & (m_{33} - Z_{\dot{w}}) w_{r} & - (m_{22} - Y_{\dot{v}}) v_{r} \\ (m_{33} - Z_{\dot{w}}) w_{r} & 0 & (m_{11} - X_{\dot{u}}) u_{r} \\ (m_{22} - Y_{\dot{v}}) v_{r} & - (m_{11} - X_{\dot{u}}) u_{r} & 0 \end{matrix}]

C_{2} = [- \begin{matrix} 0 & (I_{z z} - N_{\dot{r}}) r & - (I_{y y} - M_{\dot{q}}) q \\ (I_{z z} - N_{\dot{r}}) r & 0 & (I_{x x} - K_{\dot{p}}) p \\ (I_{y y} - M_{\dot{q}}) q & - (I_{x x} - K_{\dot{p}}) p & 0 \end{matrix}]

D (v) = - diag {\begin{matrix} X_{\dot{u}} + X_{u |u|} |u|, Y_{\dot{v}} + Y_{v |v|} |v|, Z_{\dot{w}} + Z_{w |w|} |w|, \\ K_{\dot{p}} + K_{p |p|} |p|, M_{\dot{q}} + M_{q |q|} |q|, N_{\dot{r}} + N_{r |r|} |r| \end{matrix}}

g (η) = [\begin{matrix} \begin{matrix} (W - B) s θ \\ \begin{matrix} - (W - B) c θ s ψ \\ - (W - B) c θ c ψ \end{matrix} \end{matrix} \\ \begin{matrix} y_{B} B c θ c ψ - z_{B} B c θ s ψ \\ \begin{matrix} - z_{B} B s θ - x_{B} B c θ c ψ \\ x_{B} B c θ s ϕ + y_{B} B s θ \end{matrix} \end{matrix} \end{matrix}]

g (η) = [\begin{matrix} \begin{matrix} (W - B) s θ \\ \begin{matrix} - (W - B) c θ s ψ \\ - (W - B) c θ c ψ \end{matrix} \end{matrix} \\ \begin{matrix} y_{B} B c θ c ψ - z_{B} B c θ s ψ \\ \begin{matrix} - z_{B} B s θ - x_{B} B c θ c ψ \\ x_{B} B c θ s ϕ + y_{B} B s θ \end{matrix} \end{matrix} \end{matrix}]

τ = {[X, 0, Z, 0, M, N]}^{T}

where W and B are the gravity and buoyancy of the AUV. $(x_{B}, y_{B}, z_{B})$ is the coordinate of floating center. c, s, and t are shorthand for the mathematical expression of cos, sin, and tan, respectively.

Accurate hydrodynamic coefficients are needed in the simulation training of controller parameter adjustment through deep reinforcement learning. Table 3 presents that the hydrodynamic coefficient of the DSRV is obtained through calculation and a model test.

Table 3.

Hydrodynamic coefficients of the DSRV.

Coefficient = value	Coefficient = value	Coefficient = value	Coefficient = value
X’_qq = 0.003368	X_rr = −0.004231	X_du = −0.03012	X_wp = 0.074549
X_uu = −0.0225	Y ₀ = 0.00118	Y_dv = −0.11219	Y_vv = −0.07808
Y_r = −0.01178	Y_dr = 0.004231	Y_pq = −0.003368	Y_dv = −0.051386
Y_wp = −0.074549	Y_v\|v\| = −0.07808	Y_v = −0.06562	Y_r\|r\| = 0.00532
Y_v\|r\| = −0.000243	Z ₀ = −0.00204	Z_w = −0.071	Z_w\|w\| = −0.06137
Z_dw = −0.074549	Z_q = −0.0124221	Z_dq = 0.003368	Z_rp = −0.004231
Z_vp = 0.051386	Z_w\|q\| = −0.001472	Z_q\|q\| = 0.004809	K_p = −0.00347
K_dp = 0.000297	K_qr = −0.00037	K_vq = 0.007599	K_wr = 0.007599
M_dw = −0.00956	M_w = −0.0246	M_q = 0.000352	M_dq = −0.00288
M_vp = −0.004231	M_pr = 0.003547	M_w\|w\| = −0.000736	M_w\|q\| = −0.00962
M_q\|q\| = −0.000216	N_v = −0.02321	N_dv = −0.00659	N_r = −0.000407
N_dr = −0.00325	N_wp = 0.003368	N_rr = −0.0001751	N_pq = −0.003177
N_qr = 0.00325	N_v\|r\| = 0.01063

DSRV: deep submergence rescue vehicle.

Design of controller

The DSRV has two control objects: propeller and regulating tank. The speed, position, heading, and depth of the DSRV can be controlled by controlling the output force of the propeller. The roll and trim of the DSRV can be controlled by adjusting the water tank. The S-plane control algorithm is adopted for propeller control, and the integral separation PID control algorithm is adopted for water tank adjustment. At the same time, the DDPG method is used to train the parameters of the controller, and the variable parameter adaptive control of the controller is identified.

Basic control algorithm

S-plane controller

S-plane control learns from the PID method based on fuzzy control.³¹ Figure 6 is the control surface of the S-plane method which expresses the relationship between deviation, deviation rate and control force. The algorithm is expressed as follows

\{\begin{array}{l} u_{i} = 2.0 / (1.0 + exp (- k_{i 1} e_{i} - k_{i 2} \dot{e_{i}})) - 1.0 + Δ u_{i} \\ f_{i} = K_{i} u_{i} \end{array}

Figure 6.

Control surface of the S-plane.

where i is the ith degree of freedom . e_i and $\dot{e_{i}}$ are the control error and its derivative. u_i is the object of control. k _i ₁ and k _i ₂ are the control parameters. f _i is the control force. K _i is the maximum force or torque. $Δ u_{i}$ corresponds to the magnitude of the interference force.

Integral separation PID controller

The CS uses the water tanks and an electromagnetic valve to control the trim and heel.

Figure 7 shows that the ballast tanks are arranged. There are four tanks arranged on both sides of the compressive cabin to adjust heel. There are two tanks arranged on both ends of the compressive cabin to adjust trim. The pump and valve system is installed in the compressive cabin. The solenoid valve is controlled for ballast water regulation.

Figure 7.

The tank arrangement for CS. CS: CHUAN SUO.

By manually controlling the solenoid valve to adjust the vertical and horizontal inclination tests, it is found that the changes of CS inclination angle are basically linear with time. In this study, integral separation control algorithm based on PID is used to calculate the opening and closing time of solenoid valve, so as to realize the automatic control of trim and heel

t (k) = K_{P} e (k) + K_{I} \sum_{i = 0}^{k} e (i) + K_{D} \dot{e} (k)

where k is the serial number of the current order. t is the opening or closing time. $e (k)$ and $\dot{e} (k)$ are the control error and its derivative. K_P , K_I , and K_D are the proportion, integral, and differential coefficients, respectively. When e is greater than the threshold value, the integral term is removed. Otherwise, the integral term is added.

Variable parameter controller based on reinforcement learning

The DDPG algorithm is adopted to train the parameters of the controller, and the real-time adaptive adjustment of parameters is identified.

DDPG is a deep reinforcement learning algorithm that combines Actor Critic and deep q network (DQN). The concept is to apply the memory banks in the DQN structure and the idea of two neural networks with the same structure but different frequencies of parameter update on Actor Critic. The DDPG algorithm is divided into two parts: Main Net and Target Net, as shown in Figure 8. Each part includes an ActorNet and a CriticNet. The ActorNet outputs control system parameters, and the CriticNet evaluates each action. In consideration of a standard RL problem, a finite Markov decision process, which comprises a current state s_t , an action space a_t , a reward function r, and the next state s_t ₊₁, is established. During the learning process, the reward value is adjusted constantly, and it can be found that the absolute value of the difference between the actual motion state of the DSRV and the target state as the system reward is the best. Hence

r = [- |Δ u|, - |Δ v|, - |Δ ω|, - |Δ φ|, - |Δ θ|, - |Δ ψ|]

Figure 8.

Deep deterministic policy gradient.

where $Δ u = u - \hat{u}, Δ v = v - \hat{v}, Δ ω = ω - \hat{ω}, Δ φ = φ - \hat{φ}, Δ θ = θ - \hat{θ}, Δ ψ = ψ - \hat{ψ}$ , and $u, v, ω$ are the longitudinal, lateral, and vertical velocities of the AUV, respectively. $φ$ , $θ$ , and $ψ$ are the heel and pitch and heading angle of the AUV, respectively. $\hat{u}, \hat{v}, \hat{ω}, \hat{φ}, \hat{θ}, \hat{ψ}$ are the target states. When the target value is reached, $r = r + 1$ is set so that the control term is stable at the target value.

Experience is gained from the experience pool for learning. The 6-DOF action a_t form $(s_{t}, a_{t}, r, s_{t + 1})$ is used as input for the six critic networks of Main Net, and each critic network calculates the value function of each action using the Bellman equation

Q^{π} (s_{t}, a_{t}) = E_{r_{t}, s_{t + 1} ∼ E} [r (s_{t}, a_{t}) + γ E_{a_{t + 1} \sim π} [Q^{π} (s_{t + 1}, a_{t + 1})]]

If the target policy is deterministic, then a function $μ : s \leftarrow a$ can be described, and the inner expectation is avoided

Q^{μ} (s_{t}, a_{t}) = E_{r_{t}, s_{t + 1} ∼ E} [r (s_{t}, a_{t}) + γ [Q^{μ} (s_{t + 1}, μ (s_{t + 1}))]]

where λ is the learning rate. According to Q-learning³² and considering the function approximators parameterized by $θ^{Q}$ , the loss function is

L (Q) = R + γ {max}_{a} Q (s_{t+1}, μ (s_{t + 1}) |θ^{Q}) - Q (s_{t}, a_{t} |θ^{Q})

The neural network is trained to minimize the loss function so that the actual Q value tends to the target Q value.

A parameterized actor function $μ (s |θ^{μ})$ , which specifies the current policy by deterministically mapping states to a specific action is maintained on the basis of policy gradient. The parameters of the motion estimation network are updated by the following function

\begin{matrix} \nabla_{θ^{μ}} J \approx \frac{1}{N} \sum_{t} \nabla_{θ^{μ}} Q (s, a | θ^{Q}) |_{s = s_{t}, a = μ (s_{t} | θ^{μ})} \\ = \frac{1}{N} \sum_{t} \nabla_{a} Q (s, a |θ^{Q}) |_{s = s_{t}, a = μ (s_{t})} \nabla_{θ^{μ}} μ (s |θ^{μ}) |_{s = s_{t}} \end{matrix}

where N is the number of learning. The idea of using DDPG to train the controller parameters is to input the speed and attitude errors of the DSRV into the algorithm as the state, and the algorithm outputs the controller parameters. The neural network can be trained to obtain the optimal parameters, so that the motion state of the DSRV can stabilize toward the target state.

The application of DDPG in DSRV control requires the establishment of a critic $Q (s_{t}, a_{t} |θ^{Q})$ and an actor $μ (s_{t} |θ^{μ})$ neural network structure. $θ^{Q}$ and $θ^{μ}$ are the weight parameters of the network. The action output of DDPG is considered the parameters of the control system. In combination with function $a = μ (s_{t} |θ^{μ})$ , the parameters can be expressed as

[k_{i 1}, k_{i 2}] = a_{k} = μ (s_{t} |θ^{μ})

[K_{P}, K_{I}, K_{D}] = a_{K} = μ (s_{t} |θ^{μ})

Formulas and are used for training in two reinforcement learning systems. a_k and a_K represent the action output of the two systems, respectively. Thus, the controller can be expressed as

\{\begin{array}{l} u_{i} = 2.0 / (1.0 + exp (- α_{k} [1] e_{i} - α_{k} [2] \dot{e_{i}})) - 1.0 + Δ u_{i} \\ f_{i} = K_{i} u_{i} \end{array}

t (k) = a_{K} [1] e (k) + a_{K} [2] \sum_{i = 0}^{k} e (i) + a_{K} [3] e^{'} (k)

The learning-and-training process of the controller is shown in Algorithm 1.

Algorithm 1

Controller training process.

Python is used to build the training environment, and the training results are as follows:

In Figure 9, the abscissa represents the episodes of training, and the ordinate represents the total reward of each episode. A total of 10,000 episodes are created during simulation training. Each training episode is updated with 500 steps. The control range of velocity (0–3 m/s), the control range of heading angle (−180° to 180°), and the control range of pitch angle (−45° to 45°) can be obtained through training. Figure 9 shows that when the reward value basically converges to 50, the training is successful. The parameters obtained from the training are applied to the DSRV after training in the Python simulation environment. The self-learning ability of the controller parameters is maintained. Real-time online learning is conducted to optimize the parameters of the controller continuously and adapt to different and complex real environments during an actual operation.

Figure 9.

Training results of control parameters based on DDPG. DDPG: deep deterministic policy gradient.

Thrust allocation

The CS is equipped with six thrusters, including two main propellers, two side propellers, and two vertical propellers. The placement of the thrusters on the CS is shown in Figure 10. l ₁, l ₂, l ₃, l ₄, l ₅, and l ₆ are the moment arms for each propeller to the center of the CS. Thus, the relationship between the force required at the 4-DOF of CS can be obtained as follows

[\begin{matrix} X \\ Y \\ Z \\ N \end{matrix}] = [\begin{matrix} 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \\ l_{1} & - l_{2} & l_{3} & - l_{4} & 0 & 0 \end{matrix}] [\begin{matrix} T_{1} \\ T_{2} \\ T_{3} \\ T_{4} \\ T_{5} \\ T_{6} \end{matrix}]

Figure 10.

Thruster plan of the CS underwater vehicle. CS: CHUAN SUO.

where X, Y, and Z correspond to the forces on 3-DOF; N is the torque; and T ₁ and T ₂ are the thrusts of main propellers, T ₃ and T ₄ are the thrusts of side propellers, T ₅ and T ₆ are the thrusts of vertical propellers.

The formula (31) shows a redundant propulsion system. Consider the two constraints: the main propellers only adjust the longitudinal velocity and the vertical propellers only adjust the vertical velocity. Thus

\{\begin{cases} T_{1} l_{1} - T_{2} l_{2} = 0 \\ T_{5} l_{5} - T_{6} l_{6} = 0 \end{cases}

where

\{\begin{cases} l_{1} = l_{2} \\ l_{5} = l_{6} \end{cases}

Combining formulas and

\{\begin{cases} T_{1} = T_{2} = X / 2 \\ T_{5} = T_{6} = Z / 2 \\ T_{3} = \frac{Y l_{4} + N}{l_{3} + l_{4}} \\ T_{4} = \frac{Y l_{3} - N}{l_{3} + l_{4}} \end{cases}

When T ₁, T ₂, T ₃, and T ₄ are greater than $T_{max}$ which is the maximum thrust that the propeller can provide. Thus

|T_{1}| = |T_{2}| = T_{max} |T_{5}| = |T_{6}| = T_{max}

Two situations exist for T ₃ and T ₄. Firstly, when they are greater than $T_{max}$

|T_{3}| = |T_{4}| = T_{max}

Secondly, one of them is greater than T _max, assuming T ₃ > T _max > T ₄. T ₃ and T ₄ both adjust the lateral and heading motions of CS. The principle of “heading priority” is adopted for the thrust allocation, that is, the torque of turning bow is first satisfied

\{\begin{cases} |T_{3}| = T_{max} \\ |T_{4}| = |\frac{T_{max} l_{3} - N}{l_{4}}| \end{cases}

If $|T_{4}| > T_{max}$ , then $|T_{4}| = T_{max}$ is considered.

Simulation experiment

Deep-dive rescue is divided into four stages. The first stage is the approach stage. The DSRV is usually hoisted and lowered vertically on the mother ship and then dives after entry on this stage. The DSRV sails at a fixed depth, heading, and velocity to within 10 m of the wrecked submarine. The second stage is the dynamic positioning stage of the 4-DOF control. In this stage, the 4-DOF control of the heading and the longitudinal, transverse, and vertical positions are identified and named 4-DOF dynamic positioning. The DSRV eventually hovers at an altitude above the wrecked submarine. In the third stage, the attitude is adjusted on the basis of the 4-DOF dynamic positioning so that the DSRV could hover on the wrecked submarine with the target trim angle and the heeling angle. Finally, the 6-DOF dynamic positioning is identified. The fourth stage is the approach and lifesaving stage. After approaching a certain height, the DSRV attitude is kept, and the translation of three positions is controlled so that the docking structure of the DSRV and the rescue platform of the wreck submarine can be connected. Misalignment or skew may occur during this process, and adjusting the position or attitude so that the DSRV can be socketed with the rescue platform of the wreck submarine is necessary. Finally, the people on the wrecked submarine can safely enter the DSRV to complete the lifesaving stage through water light to identify the atmospheric pressure of the lifeboat passage.

The simulation test environment is built to simulate the docking process between the DSRV and the submarine, as shown in Figure 11. Figure 11(a) shows the DSRV sailing toward the target wrecked submarine. Figure 11(b) shows the DSRV entering the communication range of USBL. Figure 11(c) shows the positioning of the DSRV guided by visual and acoustic sense and the docking completed by the controller.

Figure 11.

Simulation of the deep submergence rescue: (a) the first stage, (b) the second stage, and (c) the third and fourth stages.

In the simulation, the DSRV adjusts the heading and sails to the wrecked submarine at a fixed depth and speed. Figure 12(a) and (b) shows the control results. The transverse, longitudinal, and vertical positions and the heading stability of the DSRV are controlled at the target value to achieve the dynamic positioning of 4-DOF control and finally hovers on the wrecked submarine when approaching the range of approximately 10 m of the wrecked submarine. The attitude of the DSRV is adjusted to make it hover on the wrecked submarine with the target trim angle and heeling angle based on the 4-DOF dynamic positioning. Finally, the 6-DOF dynamic positioning is achieved. Figure 12(d) shows the attitude control results. Figure 12(c) shows that the three positions shift under the condition of three attitude control after approaching the height of the platform, so that the skirt body of the lifeboat is connected with the rescue platform of the wreck boat. Finally, the three positions are shifted under the premise of maintaining the attitude after approaching a certain height of the submarine. This action is performed so that the skirt of the DSRV connects with the rescue platform of the wrecked submarine.

Figure 12.

Simulation results: (a) speed control result, (b) heading control result, (c) position control result, and (d) attitude control results.

According to the simulation results, the DSRV can successfully reach the target position to complete the docking rescue task using the proposed control method. The control effect is stable, and the convergence speed is fast. Therefore, the adaptive control method proposed in this study is effective and feasible.

Tank experiment

A tank experiment, as shown in Figure 13, was performed. The docking device adopts an articulated skirt structure located in the middle of the DSRV (red circle range), and the scale ratio of the carrier is 1:2. The experiment is conducted in the deep pool laboratory. The pool is 50 m long, 30 m wide, and 10 m deep.

Figure 13.

Tank experiment: (a) the articulated skirt and (b) installation of the articulated skirt.

The DSRV reaches the target position by controlling the propeller and stabilizes at the target position in the test. The DSRV dives to the target depth and adjusts its bow, trim, and rake to the target value.

Figure 14(a) and (b) shows the horizontal and longitudinal position control curves, respectively. The horizontal coordinate represents the running time of the system, the vertical coordinate represents the transverse and longitudinal positions of the DSRV in the geodetic coordinate system, and the center of the geodetic coordinate system is selected in the center of the tank. Results show that the proposed DDPG-based variable parameter control method can control the DSRV positioning at the target position well. In addition, it can control the DSRV to reach the target position faster and more steadily than the traditional control method. Figure 14(c) shows the depth control curve of the lifeboat. The DSRV dives down to a depth of 6 m after reaching the target position. The controller can stabilize the DSRV at the target depth well. Figure 14(d) to (f) shows the heading, trim, and heeling control curves of the DSRV, respectively. The trim and heeling control of DSRV are identified through water tank adjustment. The control curve of the proposed control method is stable. In conclusion, compared with the ordinary fixed parameter control method, the DDPG-based variable parameter control method has certain anti-flow ability and adaptability and can control DSRV navigation and docking more quickly and steadily.

Figure 14.

Control results of the tank experiment: (a) horizontal position control results, (b) longitudinal position control results, (c) depth control results, (d) heading control results, (e) trim control results, (f) heeling control results.

Figure 15 shows that the docking is completed with acoustic and visual guidance after the adjustment of the position and attitude of the DSRV.

Figure 15.

Screenshots of the docking process: (a) towards the target and (b) completing docking.

According to the test results, the method of combining visual and acoustic sense can successfully guide the DSRV to complete the docking task under the condition of unknown wrecked submarine posture. Compared with the ordinary control method, the DDPG-based parameter adjustment control method has stronger adaptability to the environment, has a certain anti-flowability, and can control the DSRV to complete the docking task more quickly and steadily.

Conclusions

This study proposes a guidance method based on the combination of vision and acoustics, considering the difficulties in determining the attitude of the wrecked submarine and controlling the instability in submarine rescue. A variable parameter controller based on reinforcement learning is designed. The effectiveness of the proposed guidance method is proven by simulation and tank tests. The attitude of the submarine can be measured during docking to adjust the attitude of the DSRV in real time while combining vision and acoustics. The process of obtaining the wreck submarine information in advance is omitted; hence, considerable detection time is saved, and rescue efficiency is improved. The automatic matching problem of a submarine with large inclination under unfavorable sea conditions is solved. The controller with the DDPG-trained parameters and the controller without training are compared through a tank experiment; results show that the variable parameter controller based on reinforcement learning has better control effect. Moreover, the proposed controller is more suitable for complex and changeable underwater environment and improves the accuracy and stability of the docking process.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was supported in part by the Equipment Pre-research Project (Project Number 41412030201), and the China National Natural Science Foundation (Project Numbers 51779057 and 51709061).

ORCID iD

Jian Cao

References

William

. Review of deep ocean manned submersible activity in 2013. Mar Technol Soc J 2013; 47(5): 56–68.

Brown

Amundsen

Bartnicki

, et al. Impacts on the terrestrial environment in case of a hypothetical accident involving the recovery of the dumped Russian submarine K-27. J Environ Radioact 2016; 165: 1–12.

Paul

. A modern history of the manned submersible. Mar Technol Soc J 2015; 49(6): 65–78.

Joiner

Keith F

Simon

. Australia's Future Submarine: Shaping Early Adaptive Designs through Test and Evaluation. Australian Journal of Multi-Disciplinary Engineering 2016; 12(1): 3–26.

Chu

Meng

Zhu

, et al. Fault reconstruction using a terminal sliding mode observer for a class of second-order MIMO uncertain nonlinear systems. ISA Trans 2020; 97: 67–75.

Hidehiko

Hiroyuki

Tsuyoshi

, et al. Development of work class ROV applied for submarine resource exploration in JAMSTEC. In: OCEANS 2012 MTS/IEEE Yeosu: The Living Ocean and Coast – Diversity of Resources and Sustainable Activities, Yeosu, Korea, 21–24 May, 2012.

Rui

Jingqi

Safety assessment model of deep submarine rescue task. In: Proceedings of the 2012 International Conference on Computer Application and System Modeling, Shenyang, China, 5 August 2012.

Anette

Amram

Learning from the Kursk submarine rescue failure the case for pluralistic risk management. In: 5th International Disaster and Risk Conference, Davos, Switzerland, 24–28 August, 2014, pp. 473–499.

Ratcliffe

Peacock

Manion

The development of an improved submarine escape system. In: RINA, Royal Institution of Naval Architects – International Maritime Conference, Darling Harbour, NSW, Australia, 2012, pp. 312–321.

10.

DeSpirito Andrew

Hannuksela Steven

L. U.S.

Navy submarine rescue: New equipment, new technologies, and new partnerships. In: Marine Technology Society – Conference on Underwater Intervention, New Orleans, LA, USA, 29–31 January, 2008, pp. 151–165.

11.

Grob

HW.

Sea trials on the new US navy submarine rescue system. In: Oceans 2007 MTS/IEEE Conference, Vancouver, BC, Canada, 29 September–4 October, 2007.

12.

John

The NATO submarine rescue vehicle system. In: Marine Technology Society – Conference on Underwater Intervention, New Orleans, LA, USA, 29–31 January, 2008, pp. 30–34.

13.

Iwanowicz

McBride

Lilly

. The United States navy’s submarine safety program. In: RINA, Royal Institution of Naval Architects International Conference – Warship 2005: Naval Submarines 8, London, UK, 22–23 June, 2005, pp. 25–35.

14.

Jing

. Summary of submarine rescue and lifesaving equipment [in Chinese]. Ship Eng 2009; 31: 71–74.

15.

Stubbs

. NATO submarine rescue system – the journey to world class. In: 11th International Naval Engineering Conference and Exhibition, Edinburgh, UK, 15 May, 2012.

16.

Toby

Manion

Development of an integrated submarine escape system. In: RINA, Royal Institution of Naval Architects – Warship 2011: Naval Submarines and UUVS, Bath, UK, 29–30 June, 2011, pp. 107–116.

17.

Jiang

Cao

, et al. AUV docking experiments based on vision positioning using two cameras. Ocean Eng 2015; 110: 163–173.

18.

Liang

Wan

, et al. Three-dimensional path following of an underactuated AUV based on fuzzy backstepping sliding mode control. Int J Fuzzy Syst 2018; 20(2): 640–649.

19.

Xia

Pang

Shao

, et al. Positioning control of deep submergence rescue vehicle based on adaptive command filtered backstepping method. J Huazhong Univ Sci Technol 2016; 44: 75–80.

20.

Park

Jun

Lee

, et al. Experiments on vision guided docking of an autonomous underwater vehicle using one camera. Ocean Eng 2009; 36: 48–61.

21.

Teo

Beaujean

. A robust fuzzy autonomous underwater vehicle docking approach for unknown current disturbances. IEEE J Oceanic Eng 2012; 37(2): 143–155.

22.

Fan

, et al. Underwater docking of an under-actuated autonomous underwater vehicle: system design and control implementation. Front Inform Technol Electron Eng 2018; 19(8): 1024–1041.

23.

Liu

Anguelov

Erhan

, et al. SSD: single shot multibox detector. In: Computer Vision – ECCV 2016, Amsterdam, Netherlands, 11–14 October 2016, pp. 21–37.

24.

Liang

Wang

, et al. Swarm control with collision avoidance for multiple underactuated surface vehicles. Ocean Eng 2019; 191(106516): 1–10.

25.

Sun

Lei

, et al. S plane control based on parameters optimization with simulated annealing for underwater vehicle. IEEE T Signal Process 2015; 63(10): 2533–2545.

26.

Liang

Hou

, et al. Distributed coordinated tracking control of multiple unmanned surface vehicles under complex marine environment. Ocean Eng 2020; 205(107328): 1–9.

27.

Silver

Lever

Heess

, et al. Deterministic policy gradient algorithms. In: International Conference on Machine Learning, Beijing, China, 9 April 2014.

28.

Sun

Ran

, et al. Thruster fault diagnosis method based on Gaussian particle filter for autonomous underwater vehicles. Int J Nav Arch Ocean Eng 2016; 8: 243–251.

29.

Liang

Wang

, et al. A novel distributed and self-organized swarm control framework for underactuated unmanned marine vehicles. IEEE Access 2019; 7: 112703–112712.

30.

Fossen

. Handbook of marine craft hydrodynamics and motion control. Hoboken: John Wiley & Sons, 2011.

31.

Tong

. Adaptive Fuzzy Control With Prescribed Performance for Block-Triangular-Structured Nonlinear Systems. IEEE Transactions on Fuzzy Systems 2018; 26(3):1153–1163.

32.

Watkins

CJCH

Dayan

. Q-learning. Mach Learn 1992; 8(3-4): 279–292.