Sage Journals: Discover world-class research

Abstract

The prescribed performance control of a morphing aircraft with variable sweep wings is investigated based on switched nonlinear systems and reinforcement learning. Switched nonlinear systems in lower triangular form are first adopted to describe the longitudinal altitude motion, and an error transformation is applied to handle the prescribed performance bound. Then, the designed controller is divided into the basic part and supplementary part. For the basic part, the backstepping method with involvement of the modified dynamic surface control technique is utilized to avoid the “explosion of complexity” problem. Improved disturbance observers inspired from the idea of extended state observer are then designed to estimate the disturbances and combined with radial basis function neural networks to develop the common virtual control laws. Moreover, by using the error variables defined in the backstepping design, the reinforcement learning–based supplementary part controller is devised with the critic-action neural network structure, which can adjust its parameters online and further decrease the altitude tracking error. It is proved that all signals of the closed-loop system are uniformly ultimately bounded, and the prescribed performance bound for convergence of the altitude tracking error can be satisfied. Finally, comparative simulations demonstrate the effectiveness of the proposed control approach.

Keywords

Morphing aircraft switched nonlinear systems backstepping extended state observer prescribed performance reinforcement learning

Introduction

The morphing aircraft can alter its aerodynamic configuration to expand mission capabilities and obtain optimal flight performance.^1–3 The change in aircraft geometrical shape resolves performance conflicts between high-speed and low-speed regimes. However, the large variations in mass distribution and applied aerodynamic forces during the morphing process may cause unstability of the aircraft, together with internal uncertainties and external disturbances. Hence, compared with traditional flight vehicles, the control of morphing aircraft is more challenging and has been explored by the worldwide researchers in recent years.^4,5

The extant research on the morphing aircraft control mainly focuses on the longitudinal dynamics, where several representative control methods have been investigated based on linear parameter varying (LPV) model or general nonlinear systems. An inner loop linear quadratic controller is combined with outer loop gain-scheduled controller in Yue et al.⁶ to guarantee stability. The linear matrix inequalities and sliding mode–based approaches are investigated in He et al.⁷ and Wen et al.⁸ for the LPV model, respectively. A high-order chained differentiator is combined with neural networks (NNs) to determine the control input in Wu et al.,⁹ where a coordinate transformation is adopted to obtain normal form of the aircraft motion model. The adaptive dynamic surface control is proposed in Wu et al.¹⁰ to deal with input–output constraints via a barrier Lyapunov function, where system uncertainties are approximated by NNs. The constraints on active region of the NNs are further relaxed in Wu et al.¹¹ by a smooth switching between the neural controller and robust controller. He et al.¹² adopt NNs to deal with the model uncertainties and design a disturbance observer to attenuate the effects of the disturbances of a flapping wing micro aerial vehicle.

It is noted that the aforementioned LPV model–based approaches may degrade the control performance when the nonlinearities are severe. Meanwhile, the approximation of the variations in mass distribution and aerodynamic forces is difficult by a single NN in Wu et al.^9–11 due to the time-varying characteristics. In contrast, switched systems provide a powerful tool for modeling of morphing aircraft dynamics, where the switching law describes the morphing process, and the subsystems correspond to the specific configurations along the transition trajectory. The longitudinal dynamics of a morphing aircraft with variable sweepback wings is approximated by uncertain switched linear systems in Wang et al.,¹³ where a robust state feedback controller is designed to guarantee finite-time boundedness of the closed-loop system. Switched LPV models in the continuous domain are established in Jiang et al.,¹⁴ and the smooth switching control problem is investigated via overlapped scheduled parameter subsets. The non-fragile $H_{\infty}$ controller is proposed in Cheng et al.¹⁵ for the switched LPV models in the discrete domain and is expanded to finite-time $H_{\infty}$ control in Cheng et al.¹⁶ Nevertheless, the methods in previous studies^14–16 require linearization of the nonlinear dynamics, and switched nonlinear systems can be further adopted to enlarge the modeling capability, which contributes to reducing the conservativeness of controller design.

Considerable progress has been made in stability analysis and controller design for switched nonlinear systems during the past decade,^17–21 where the studied control schemes mainly deal with stability of the closed-loop system and do not address the performance of tracking errors. Recently, the prescribed performance control proposed in Bechlioulis and Rovithakis^22–24 and developed for various classes of nonlinear systems in previous studies^25–27 has been extended to switched nonlinear systems. The main idea of prescribed performance control is coordinate transformation, and the convergence of tracking error can then be guaranteed with prescribed decay rate, maximum overshoot and steady value. The prescribed performance control of switched nonlinear systems in non-strict-feedback form is investigated in Li et al.,²⁸ where semi-globally uniform ultimate boundedness is achieved. A prescribed performance adaptive controller is developed in Li and Xiang²⁹ to deal with the input saturation and unmodelled dynamics of a class of switched nonlinear systems. Mode-dependent adaptive laws are devised in Zhai et al.³⁰ to reflect the switchings between subsystems, which is less conservative compared with the common adaptive laws. However, to our best knowledge, the related research on control of morphing aircraft via switched nonlinear systems has not been fully investigated yet. Besides, the extended state observer (ESO)-based control approach provides a universal way of dealing with disturbances,^31,32 where the disturbances are regarded as an extended state estimated by the ESO and compensated for by the controller. It is noteworthy that the research on ESO for switched nonlinear systems is also rare.

However, reinforcement learning has been studied to find optimal control solutions for various classes of systems over the last few decades.^33–35 The critic-action structure is commonly utilized in the reinforcement learning, where two agents called critic agent and action agent are adopted to estimate performance index and revise the control policy, respectively. The reinforcement learning control problems of nonlinear systems including partially and fully unknown dynamics are addressed in literature,^36–38 where NNs are adopted to implement the algorithm. The reinforcement learning control has also been applied to robot manipulators,³⁹ hypersonic vehicles,⁴⁰ autonomous underwater vehicles⁴¹ and so on. In particular, it is worth pointing out that Mu et al.⁴⁰ combine sliding mode control and reinforcement learning to solve the tracking problem of hypersonic vehicles, where a supplementary strategy is generated by the reinforcement learning approach to further reduce tracking errors and improve control performance. In view of the online learning and adaption merit, it is desirable to integrate the reinforcement learning into morphing aircraft controller design.

Motivated by the above considerations, this paper investigates prescribed performance control of morphing aircraft based on switched nonlinear systems and reinforcement learning, where the prescribed performance bound is handled by error transformation. The main contributions of our work are summarized as follows:

The longitudinal altitude motion of morphing aircraft is modeled as switched nonlinear systems in lower triangular form, and it provides an efficient way to cope with the variations in mass distribution and applied aerodynamic forces.

The designed controller is split into two parts: the basic part and supplementary part. For the basic part, the backstepping approach with integration of the dynamic surface control technique is exploited to eliminate the “explosion of complexity” problem. The disturbance observers inspired from the idea of ESO are designed and combined with the radial basis function (RBF) NNs to generate the common virtual control laws.

The critic-action NN structure of the reinforcement learning scheme is adopted to devise the supplementary part on the basis of tracking errors defined in the backstepping design. The weight updating algorithm of the established critic-action NN structure and related convergence analysis are provided. The improved control performance via generating a supplementary strategy is validated by comparative simulation studies.

The remaining part of this study is organized as follows. Section “Model description” states the modeling of morphing aircraft by switched nonlinear systems. Section “Controller design” presents the basic part of the controller based on the switched nonlinear systems and the supplementary part of the controller based on reinforcement learning. Comparative simulation studies are addressed in section “Numerical simulation,” and final conclusion is drawn in section “Conclusion.”

Notations

The notations in this paper are standard. $R$ and $R^{n}$ denote the set of real numbers and n-dimensional real vectors, respectively. $I_{n}$ refers to the n-dimensional identity matrix. The maximum and minimum eigenvalues of a matrix $P$ are denoted by $λ_{\max} (P)$ and $λ_{\max} (P)$ , respectively. $∥ \cdot ∥ represents$ the Euclidean vector norm.

Model description

The morphing aircraft with variable sweep wings considered in this study is similar with the research in Wu et al.^9–11 and the longitudinal motion model is formulated as¹⁰

{\begin{matrix} \overset{\cdot}{h} = V \sin γ \\ \overset{\cdot}{γ} = \frac{L + T \sin α - F_{Ikz}}{mV} - \frac{g}{V} \cos γ \\ \overset{\cdot}{α} = - \frac{L + T \sin α - F_{Iz}}{mV} + \frac{g}{V} \cos γ + q \\ \overset{\cdot}{q} = \frac{- {\overset{\cdot}{I}}_{y} q - S_{x} g \cos θ + M_{A} + T Z_{T} + M_{Iy}}{I_{y}} \end{matrix}

(1)

where $h$ , $γ$ , $α$ , and $q$ represent the altitude, flight path angle, angle of attack, and pitch rate, respectively. $m$ and $I_{y}$ denote the aircraft mass and inertia, $T$ is the engine thrust, $Z_{T}$ is the thrust moment arm, $g$ is the acceleration of gravity, $L$ and $M_{A}$ are the lift and pitch moments, and $F_{Ikz}$ , $F_{Iz}$ , and $M_{Iy}$ are inertial forces and moment related to morphing process which can be modeled as

\begin{matrix} F_{Iz} = F_{Ikz} = S_{x} (\overset{\cdot}{q} \cos α - q^{2} \sin α) + 2 {\overset{\cdot}{S}}_{x} q \cos α + {\overset{\cdot\cdot}{S}}_{x} \sin α \\ M_{Iy} = S_{x} (\overset{\cdot}{V} \sin α + V \overset{\cdot}{α} \cos α - Vq \cos α) s \end{matrix}

where $S_{x}$ denotes the static moment distribution in x-axis of the body frame. The related definitions are given as follows

\begin{matrix} S_{x} (ξ) \approx 2 m_{1} r_{1 x} + m_{3} r_{3 x} \\ L & \approx L_{α} (ξ) α + L_{0} (ξ) = \bar{q} S_{w} (ξ) (C_{L}^{α} (ξ) α + C_{L}^{0} (ξ)) \\ M_{A} & \approx M_{A α} (ξ) α + M_{A 0} (ξ) + M_{A δ_{e}} (ξ) δ_{e} + M_{Aq} (ξ) \frac{q c_{A} (ξ)}{2 V} \\ = \bar{q} S_{w} (ξ) c_{A} (ξ) (C_{M}^{α} (ξ) α + C_{M}^{0} (ξ) + C_{M}^{δ_{e}} (ξ) δ_{e} + \frac{C_{M}^{q} q c_{A} (ξ)}{2 V}) \end{matrix}

where $m_{1}$ and $m_{3}$ denote the mass of aircrafts’ wing and fuselage, $r_{1 x}$ and $r_{3 x}$ denote the position of corresponding components in the body frame, $\bar{q} = (1 / 2) ρ_{h} V^{2}$ denotes the dynamic pressure, $ρ_{h}$ denotes the air density, $ξ$ denotes the sweep angle, $S_{w} (ξ)$ denotes the wing surface, and $δ_{e}$ denotes the elevator angle. The other related definitions can be found in Wu et al.^9–11

Assumption 1

The flight path angle $γ$ is small, that is, $\sin γ \approx γ$ .^10,11 From Equation (1), when the sweep angle takes values within the available range, the related altitude dynamics can be written as

{\begin{matrix} \overset{\cdot}{h} = V γ \\ \overset{\cdot}{γ} = f_{2, σ (t)} + g_{2, σ (t)} θ + d_{γ} \\ \overset{\cdot}{θ} = q \\ \overset{\cdot}{q} = f_{4, σ (t)} + g_{4, σ (t)} δ_{e} + d_{q} \end{matrix}

(2)

where $f_{2, σ (t)} = (L_{0, σ (t)} - L_{α, σ (t)} γ) / mV - (g / V) \cos γ$ , $g_{2, σ (t)} = L_{α, σ (t)} / mV$ , $d_{γ} = (T \sin α - F_{Ikz}) / mV + Δ d_{γ}$ , $f_{4, σ (t)} = (T Z_{T} + M_{A α, σ (t)} α + M_{A 0, σ (t)} / I_{y, σ (t)}) + (M_{Aq, σ (t)} q c_{A, σ (t)} / 2 I_{y, σ (t)} V)$ , $g_{4, σ (t)} = M_{A δ_{e}, σ (t)} / I_{y, σ (t)}$ , $d_{q} = (M_{Iy} - {\overset{\cdot}{I}}_{y} q - S_{x} g \cos θ / I_{y}) + Δ d_{q}$ , and $σ (t) : [0, + \infty) \to S_{M}$ represent the switching signal related to the sweep angle $ξ$ , and $S_{M} = {1, 2, \dots, M}$ denotes the finite index set of the aircraft configurations for certain sweep angles. $Δ d_{γ}$ and $Δ d_{q}$ denote the equivalent disturbances caused by model uncertainties and exogenous perturbations, respectively.

In this paper, the RBF NNs are applied to Equation (2) for obtaining common virtual control laws in the backstepping design procedure. For any continuous function $f (Z) : R^{n} \to R$ defined in a compact set $Ω_{Z} \subset R^{n}$ with $n$ being the input dimension, there exist an RBF NN and a positive constant $\bar{ω}$ such that

f (Z) = W^{T} S (Z) + ω, | ω | ⩽ \bar{ω}

(3)

where $Z \in Ω_{Z} \subset R^{n}$ denotes the NN’s input, $ω$ is the approximation error, and $W = [w_{1}, w_{2}, \dots, w_{\bar{l}}]^{T} \in R^{\bar{l}}$ denotes the ideal weight vector with $\bar{l}$ being the NN node number, which is defined as

W : = \arg \min_{\hat{W} \in R^{\bar{l}}} {sup_{Z \in Ω_{Z}} | f (Z) - {\hat{W}}^{T} S (Z) |}

(4)

where $S (Z) = {[s_{1} (Z), s_{2} (Z), \dots, s_{\bar{l}} (Z)]}^{T}$ represents the basis function vector in Gaussian function form, that is

s_{i} (Z) = \exp [- \frac{{(Z - μ_{i})}^{T} (Z - μ_{i})}{ζ_{i}^{2}}], i = 1, 2, \dots, \bar{l}

(5)

with $m_{i} = [μ_{i 1}, μ_{i 2}, \dots, μ_{in}]^{T}$ and $ζ_{i}$ denoting the center of the receptive field and the width of the Gaussian function, respectively. Moreover, we review the following lemma for dealing with bounds on estimated values of the weight vector.

Lemma 1

The following inequality is satisfied for any $z \in R$ and any positive constant $ϵ$ ⁴²

0 | z | - z \tan h (\frac{z}{ϵ}) χ ϵ, χ = 0.2785

(6)

Controller design

The designed controller of this paper comprised two parts, that is, $δ_{e} = δ_{e 1} + δ_{e 2}$ , where $δ_{e 1}$ and $δ_{e 2}$ denote the basic and supplementary parts, respectively. The structure of the controller is depicted in Figure 1. As will be stated in section “Basic part design,” the basic part is developed by the backstepping method, where improved disturbance observers and dynamic surfaces are devised to make the controller less conservative. Specifically, the coupling terms related to the tracking errors are introduced in the disturbance observers and dynamic surfaces to deal with cross-product terms of the time derivative of the Lyapunov function. Meanwhile, the supplementary part is generated by reinforcement learning algorithm with involvement of the critic-action NN structure to further expedite the control performance.

Figure 1.

Structure diagram of the proposed controller.

Basic part design

The following assumptions are made for the switched nonlinear systems (2).

Assumption 2

The reference altitude $h_{r}$ and its first two-order time derivatives are continuous and bounded.

Assumption 3

The velocity $V$ and its time derivative are continuous and bounded.

Assumption 4

The time derivatives of disturbances $d_{γ}$ and $d_{q}$ are bounded, that is, there exist positive constants $N_{γ}$ and $N_{q}$ such that $| {\overset{\cdot}{d}}_{γ} | ⩽ N_{γ}$ and $| {\overset{\cdot}{d}}_{q} | ⩽ N_{q}$ .

Assumption 5

For any $i \in S_{M}$ , $g_{2 i}$ and $g_{4 i}$ have fixed signs and satisfy $g_{2 m} ⩽ | g_{2 i} | ⩽ g_{2 M}$ and $g_{4 m} ⩽ | g_{4 i} | ⩽ g_{4 M}$ , where $g_{2 m}$ , $g_{2 M}$ , $g_{4 m}$ , and $g_{4 M}$ are positive constants.

Remark 1

Assumption 2 is rather general in the literature on aircraft longitudinal motion control.^43,44 Assumption 3 is reasonable since the thrust force, drag force, and inertial force caused by morphing are indeed continuous and bounded. Assumption 4 is commonly adopted in extant references on disturbance observer design^31,32,45 and facilitates the convergence analysis of the disturbance observer. Assumption 5 guarantees that the designed control laws are nonsingular²¹ and can be satisfied by the model used in the numerical simulation.

The basic part of the controller is designed by the backstepping technique, which is described as follows.

Step 1. Define $\tilde{h} = h - h_{r}$ and according to the idea in Bechlioulis and Rovithakis,²⁴ the prescribed performance of altitude tracking error $\tilde{h}$ can be determined by the following inequality

- \underline{ς} ρ (t) < \tilde{h} (t) < \bar{ς} ρ (t), \forall t 0

(7)

where $ρ (t) = (ρ_{0} - ρ_{\infty}) e^{- at} + ρ_{\infty}$ , $\underline{ς}$ , $\bar{ς}$ , $ρ_{0}$ , $ρ_{\infty}$ , and $a$ are positive constants and $ρ_{0}$ satisfies $- \underline{ς} ρ_{0} < \tilde{h} (0) < \bar{ς} ρ_{0}$ . It is not difficult to see that $\tilde{h} (t)$ is less than $\max {\underline{ς} ρ_{0}, \bar{ς} ρ_{0}}$ if Equation (7) holds. To facilitate the controller design, the constrained error condition (7) is converted into the equivalent boundedness requirement by introduction of a transformed error $ϑ (t)$

\tilde{h} (t) = ρ (t) P (ϑ (t))

(8)

where $P (ϑ) = (\bar{ς} e^{ϑ} - \underline{ς} e^{- ϑ}) / e^{ϑ} + e^{- ϑ}$ is a smooth and strictly increasing function. Differentiating $ϑ (t)$ with respect to time yields

\begin{matrix} \overset{\cdot}{ϑ} (t) = \frac{1}{2} (\frac{1}{\underline{ς} + P (\cdot)} + \frac{1}{\bar{ς} - P (\cdot)}) \overset{\cdot}{P} (\cdot) \\ = Ξ (t) (\overset{\cdot}{\tilde{h}} (t) - \frac{\overset{\cdot}{ρ} (t) \tilde{h} (t)}{ρ (t)}) \end{matrix}

(9)

with $Ξ (t) = 1 / 2 ρ (t) ((1 / \underline{ς} + P (\cdot)) + (1 / \bar{ς} - P (\cdot)))$ . Notice that $ϑ (t) = (1 / 2) \ln (\underline{ς} / \bar{ς})$ when $\tilde{h} (t) = 0$ , hence we define $e_{1} = ϑ - (1 / 2) \ln (\underline{ς} / \bar{ς})$ , and the time derivative of $e_{1}$ reads

{\overset{\cdot}{e}}_{1} = Ξ (\overset{\cdot}{\tilde{h}} - \frac{\overset{\cdot}{ρ} \tilde{h}}{ρ}) = Ξ (V γ - {\overset{\cdot}{h}}_{r} - \frac{\overset{\cdot}{ρ} \tilde{h}}{ρ})

(10)

It is shown in Bechlioulis and Rovithakis²⁴ that the prescribed performance of altitude tracking error $\tilde{h}$ is satisfied if $e_{1}$ is bounded. Using Equation (10), the virtual control law $γ_{r}$ is selected as

γ_{r} = \frac{1}{V} (\frac{- k_{1} e_{1}}{Ξ} + {\overset{\cdot}{h}}_{r} + \frac{\overset{\cdot}{ρ} \tilde{h}}{ρ})

(11)

where $k_{1}$ is a positive constant. Introducing a new state variable ${\bar{γ}}_{r}$ , which is obtained by the following modified first-order filter with a time constant $τ_{1}$

τ_{1} {\overset{\cdot}{\bar{γ}}}_{r} + {\bar{γ}}_{r} = γ_{r} - τ_{1} Ξ V e_{1}

(12)

Define $e_{2} = γ - {\bar{γ}}_{r}$ and ${\tilde{γ}}_{r} = {\bar{γ}}_{r} - γ_{r}$ , and it then follows that

\begin{matrix} {\overset{\cdot}{e}}_{1} = Ξ (V γ_{r} + V e_{2} + V {\tilde{γ}}_{r} - {\overset{\cdot}{h}}_{r} - \frac{\overset{\cdot}{ρ} \tilde{h}}{ρ}) \\ = - k_{1} e_{1} + Ξ V e_{2} + Ξ V {\tilde{γ}}_{r} \end{matrix}

(13)

Choosing the following Lyapunov function candidate

L_{V_{1}} = \frac{1}{2} e_{1}^{2} + \frac{1}{2} {\tilde{γ}}_{r}^{2}

(14)

The time derivative of $L_{V_{1}}$ satisfies

\begin{matrix} {\overset{\cdot}{L}}_{V_{1}} = e_{1} (- k_{1} e_{1} + Ξ V e_{2} + Ξ V {\tilde{γ}}_{r}) \\ + {\tilde{γ}}_{r} (- \frac{1}{τ_{1}} {\tilde{γ}}_{r} - Ξ V e_{1} - {\overset{\cdot}{γ}}_{r}) \\ = - k_{1} e_{1}^{2} - \frac{1}{τ_{1}} {\tilde{γ}}_{r}^{2} + Ξ V e_{1} e_{2} \\ - {\tilde{γ}}_{r} (\frac{\partial γ_{r}}{\partial e_{1}} {\overset{\cdot}{e}}_{1} + \frac{\partial γ_{r}}{\partial {\overset{\cdot}{h}}_{r}} {\overset{\cdot\cdot}{h}}_{r} + \frac{\partial γ_{r}}{\partial V} \overset{\cdot}{V} + \frac{\partial γ_{r}}{\partial ρ} \overset{\cdot}{ρ} + \frac{\partial γ_{r}}{\partial V} \overset{\cdot}{V} + \frac{\partial γ_{r}}{\partial \overset{\cdot}{ρ}} \overset{\cdot\cdot}{ρ}) \\ = - k_{1} e_{1}^{2} - \frac{1}{τ_{1}} {\tilde{γ}}_{r}^{2} + Ξ V e_{1} e_{2} \\ - {\tilde{γ}}_{r} B_{1} (e_{1}, e_{2}, {\tilde{γ}}_{r}, h_{r}, {\overset{\cdot}{h}}_{r}, {\overset{\cdot\cdot}{h}}_{r}, V, \overset{\cdot}{V}, ρ, \overset{\cdot}{ρ}, \overset{\cdot\cdot}{ρ}) \end{matrix}

(15)

with $B_{1} (\cdot)$ being a continuous function corresponding to the time derivative of $γ_{r}$ .

Step 2. Taking the time derivative of $e_{2}$ comes to

{\overset{\cdot}{e}}_{2} = \overset{\cdot}{γ} - {\overset{\cdot}{\bar{γ}}}_{r} = f_{2, i} + g_{2, i} θ + d_{γ} + \frac{1}{τ_{1}} {\tilde{γ}}_{r} + Ξ V e_{1}

(16)

where $i \in S_{M}$ . The improved disturbance observer of the $γ$ loop is designed as follows

{\begin{matrix} \overset{\cdot}{\hat{γ}} = f_{2, i} + g_{2, i} θ + {\hat{d}}_{γ} + \frac{l_{1 γ}}{ε_{γ}} (γ - \hat{γ}) + ε_{γ} l_{3 γ} e_{2} \\ {\overset{\cdot}{\hat{d}}}_{γ} = \frac{l_{2 γ}}{ε_{γ}^{2}} (γ - \hat{γ}) + l_{4 γ} e_{2} \end{matrix}

(17)

where $\hat{γ}$ and ${\hat{d}}_{γ}$ are the states of the disturbance observer; $l_{1 γ}$ , $l_{2 γ}$ , and $ε_{γ}$ are positive constants related to the observer gain; and $l_{3 γ}$ and $l_{4 γ}$ are constants to be specified later.

With the disturbance estimation ${\hat{d}}_{γ}$ , for any given constant ${\bar{δ}}_{1, i} > 0$ , an RBF NN is used to approximate the nonlinear term $f_{2, i} + {\hat{d}}_{γ} + (1 / τ_{1}) {\tilde{γ}}_{r} + 2 Ξ V e_{1}$ such that

\begin{matrix} f_{2, i} + {\hat{d}}_{γ} + \frac{1}{τ_{1}} {\tilde{γ}}_{r} + 2 Ξ V e_{1} = W_{1, i}^{T} S_{1} (X_{1}) + δ_{1, i} (X_{1}) \\ | δ_{1, i} (X_{1}) | {\bar{δ}}_{1, i} \end{matrix}

(18)

where $W_{1, i}$ denotes the ideal weight vector, $X_{1} = [V, γ, e_{1}, {\tilde{γ}}_{r}, ρ]^{T}$ , and $δ_{1, i}$ denotes the approximation error. Substituting Equation (18) into Equation (16) gives

{\overset{\cdot}{e}}_{2} = W_{1, i}^{T} S_{1} (X_{1}) + δ_{1, i} + g_{2, i} θ - Ξ V e_{1} + d_{γ} - {\hat{d}}_{γ}

(19)

By Equation (19), it can be verified that

\begin{matrix} e_{2} (W_{1, i}^{T} S_{1} (X_{1}) + δ_{1, i}) & ⩽ | e_{2} | {\bar{Θ}}_{1} (∥ S_{1} (X_{1}) ∥ + 1) \\ ⩽ | e_{2} | {\bar{Θ}}_{1} Y_{1} (X_{1}) \end{matrix}

(20)

with ${\bar{Θ}}_{1} = \max {∥ W_{1, i} ∥, {\bar{δ}}_{1, i}, i \in S_{M}}$ , $Y_{1} (X_{1}) = ∥ S_{1} (X_{1}) ∥ + 1$ . Applying Lemma 1 to Equation (20) yields

\begin{matrix} e_{2} (W_{1, i}^{T} S_{1} (X_{1}) + δ_{1, i}) ⩽ {\bar{Θ}}_{1} e_{2} Y_{1} \tan h (\frac{e_{2} Y_{1}}{ε_{1}}) \\ + {\bar{Θ}}_{1} χ ϵ_{1} \end{matrix}

(21)

where $ϵ_{1}$ is a positive design parameter. Let $θ_{r}$ denote the virtual control law of $e_{2}$ and pass through a modified first-order filter with a time constant $τ_{2}$ to obtain a new state variable ${\bar{θ}}_{r}$

τ_{2} {\overset{\cdot}{\bar{θ}}}_{r} + {\bar{θ}}_{r} = θ_{r} - τ_{2} g_{2, i} e_{2}

(22)

Define $e_{3} = θ - {\bar{θ}}_{r}$ , ${\tilde{θ}}_{r} = {\bar{θ}}_{r} - θ_{r}$ , and $Θ_{1} = g_{2 m}^{- 1} {\bar{Θ}}_{1}$ and choose the Lyapunov function candidate as

L_{V_{2}} = L_{V_{1}} + \frac{1}{2} e_{2}^{2} + \frac{1}{2} {\tilde{θ}}_{r}^{2} + \frac{g_{2 m}}{2 r_{1}} {\tilde{Θ}}_{1}^{2}

(23)

where ${\tilde{Θ}}_{1} = {\hat{Θ}}_{1} - Θ_{1}$ and ${\hat{Θ}}_{1}$ denotes the estimation of $Θ_{1}$ and $r_{1}$ is a positive constant. The time derivative of $L_{V_{2}}$ is given by

\begin{matrix} {\overset{\cdot}{L}}_{V_{2}} ⩽ {\overset{\cdot}{L}}_{V_{1}} + e_{2} g_{2, i} (θ_{r} + e_{3} + {\tilde{θ}}_{r}) \\ + e_{2} ({\bar{Θ}}_{1} Y_{1} \times \tan h (\frac{e_{2} Y_{1}}{ε_{1}}) - Ξ V e_{1} + d_{γ} - {\hat{d}}_{γ}) \\ + {\tilde{θ}}_{r} (- \frac{1}{τ_{2}} {\tilde{θ}}_{r} - g_{2, i} e_{2} - {\overset{\cdot}{θ}}_{r}) + \frac{g_{2 m}}{r_{1}} {\tilde{Θ}}_{1} {\overset{\cdot}{\hat{Θ}}}_{1} + {\bar{Θ}}_{1} χ ϵ_{1} \\ ⩽ - k_{1} e_{1}^{2} - \frac{1}{τ_{1}} {\tilde{γ}}_{r}^{2} - \frac{1}{τ_{2}} {\tilde{θ}}_{r}^{2} - {\tilde{γ}}_{r} B_{1} (\cdot) - {\tilde{θ}}_{r} {\overset{\cdot}{θ}}_{r} + e_{2} g_{2, i} e_{3} \\ + e_{2} (g_{2 m} Θ_{1} Y_{1} \tan h (\frac{e_{2} Y_{1}}{1}) + g_{2, i} θ_{r}) \\ + e_{2} (d_{γ} - {\hat{d}}_{γ}) + \frac{g_{2 m}}{r_{1}} {\tilde{Θ}}_{1} {\overset{\cdot}{\hat{Θ}}}_{1} + {\bar{Θ}}_{1} χ_{1} \end{matrix}

(24)

From Equation (24), the virtual control law $θ_{r}$ is designed as

θ_{r} = - sign (g_{2 i}) (\frac{k_{2}}{g_{2 m}} e_{2} + {\hat{Θ}}_{1} Y_{1} \tan h (\frac{e_{2} Y_{1}}{ϵ_{1}}))

(25)

where $k_{2}$ is a positive constant and $sign (\cdot)$ denotes the signum function. Furthermore, choosing the update law of ${\hat{Θ}}_{1}$ as follows

{\overset{\cdot}{\hat{Θ}}}_{1} = r_{1} e_{2} Y_{1} \tan h (\frac{e_{2} Y_{1}}{1}) - κ_{1} {\hat{Θ}}_{1}

(26)

with $κ_{1}$ being a positive constant and ${\hat{Θ}}_{1} (0)$ being set to satisfy ${\hat{Θ}}_{1} ⩽ 0$ . Applying Equations (25) and (26) to Equation (24) implies that

\begin{matrix} {\overset{\cdot}{L}}_{V_{2}} ⩽ - k_{1} e_{1}^{2} - \frac{1}{τ_{1}} {\tilde{γ}}_{r}^{2} - \frac{1}{τ_{2}} {\tilde{θ}}_{r}^{2} - {\tilde{γ}}_{r} B_{1} (\cdot) - {\tilde{θ}}_{r} {\overset{\cdot}{θ}}_{r} + e_{2} g_{2, i} e_{3} \\ + e_{2} (g_{2 m} Θ_{1} Y_{1} \tan h (\frac{e_{2} Y_{1}}{ϵ_{1}}) - \frac{| g_{2, i} |}{g_{2 m}} k_{2} e_{2} \\ - | g_{2, i} | {\hat{Θ}}_{1} Y_{1} \tan h (\frac{e_{2} Y_{1}}{ϵ_{1}})) + e_{2} (d_{γ} - {\hat{d}}_{γ}) \\ + \frac{g_{2 m}}{r_{1}} {\tilde{Θ}}_{1} (r_{1} e_{2} Y_{1} \tan h (\frac{e_{2} Y_{1}}{ϵ_{1}}) - κ_{1} {\hat{Θ}}_{1}) + {\bar{Θ}}_{1} χ ϵ_{1} \\ ⩽ - k_{1} e_{1}^{2} - k_{2} e_{2}^{2} - \frac{1}{τ_{1}} {\tilde{γ}}_{r}^{2} - \frac{1}{τ_{2}} {\tilde{θ}}_{r}^{2} - {\tilde{γ}}_{r} B_{1} (\cdot) \\ - {\tilde{θ}}_{r} B_{2} (e_{1}, e_{2}, e_{3}, {\tilde{γ}}_{r}, {\tilde{θ}}_{r}, γ - \hat{γ}, d_{γ} - {\hat{d}}_{γ}, {\hat{Θ}}_{1}, h_{r}, {\overset{\cdot}{h}}_{r}, {\overset{\cdot\cdot}{h}}_{r}, \\ V, \overset{\cdot}{V}, ρ, \overset{\cdot}{ρ}, \overset{\cdot\cdot}{ρ}) - \frac{g_{2 m} κ_{1}}{r_{1}} {\tilde{Θ}}_{1} {\hat{Θ}}_{1} + e_{2} g_{2, i} e_{3} + e_{2} (d_{γ} - {\hat{d}}_{γ}) \\ + {\bar{Θ}}_{1} χ ϵ_{1} \end{matrix}

(27)

where $B_{2} (\cdot)$ denotes a continuous function related to the time derivative of $θ_{r}$ .

Step 3. Differentiating $e_{3}$ with respect to time yields

{\overset{\cdot}{e}}_{3} = \overset{\cdot}{θ} - {\overset{\cdot}{\bar{θ}}}_{r} = q + \frac{1}{τ_{2}} {\tilde{θ}}_{r} + g_{2, i} e_{2}

(28)

Let $q_{r}$ denote the virtual control law of $e_{3}$ and pass through a modified first-order filter with a time constant $τ_{3}$ to obtain a new state variable ${\bar{q}}_{r}$

τ_{3} {\overset{\cdot}{\bar{q}}}_{r} + {\bar{q}}_{r} = q_{r} - τ_{3} e_{3}

(29)

Defining $e_{4} = q - {\bar{q}}_{r}$ and ${\tilde{q}}_{r} = {\bar{q}}_{r} - q_{r}$ and using $L_{V_{3}} = L_{V_{2}} + (1 / 2) e_{3}^{2} + (1 / 2) {\tilde{q}}_{r}^{2}$ as a Lyapunov function candidate, we obtain

\begin{matrix} {\overset{\cdot}{L}}_{V_{3}} = {\overset{\cdot}{L}}_{V_{2}} + e_{3} (q_{r} + e_{4} + {\tilde{q}}_{r} + \frac{1}{τ_{2}} {\tilde{θ}}_{r} + g_{2, i} e_{2}) \\ + {\tilde{q}}_{r} (- \frac{1}{τ_{3}} {\tilde{q}}_{r} - e_{3} - {\overset{\cdot}{q}}_{r}) \\ = {\overset{\cdot}{L}}_{V_{2}} + e_{3} (q_{r} + e_{4} + \frac{1}{τ_{2}} {\tilde{θ}}_{r} + g_{2, i} e_{2}) \\ + {\tilde{q}}_{r} (- \frac{1}{τ_{3}} {\tilde{q}}_{r} - {\overset{\cdot}{q}}_{r}) \end{matrix}

(30)

Taking

q_{r} = - k_{3} e_{3} - \frac{1}{τ_{2}} {\tilde{θ}}_{r}

(31)

delivers

\begin{matrix} {\overset{\cdot}{L}}_{V_{3}} = {\overset{\cdot}{L}}_{V_{2}} - k_{3} e_{3}^{2} - \frac{1}{τ_{3}} {\tilde{q}}_{r}^{2} + e_{3} e_{4} + g_{2, i} e_{2} e_{3} \\ - {\tilde{q}}_{r} B_{3} (e_{1}, e_{2}, e_{3}, e_{4}, {\tilde{γ}}_{r}, {\tilde{θ}}_{r}, {\tilde{q}}_{r}, γ - \hat{γ}, d_{γ} - {\hat{d}}_{γ}, {\hat{Θ}}_{1}, \\ h_{r}, {\overset{\cdot}{h}}_{r}, {\overset{\cdot\cdot}{h}}_{r}, V, \overset{\cdot}{V}, ρ, \overset{\cdot}{ρ}, \overset{\cdot\cdot}{ρ}) \end{matrix}

(32)

with $B_{3} (\cdot)$ being a continuous function related to the time derivative of $q_{r}$ .

Step 4. The time derivative of $e_{4}$ is given by

{\overset{\cdot}{e}}_{4} = \overset{\cdot}{q} - {\overset{\cdot}{\bar{q}}}_{r} = f_{4, i} + g_{4, i} δ_{e 1} + d_{q} + \frac{1}{τ_{3}} {\tilde{q}}_{r} + e_{3}

(33)

The improved disturbance observer for the $q$ loop is designed as

{\begin{matrix} \overset{\cdot}{\hat{q}} = f_{4, i} + g_{4, i} δ_{e 1} + {\hat{d}}_{q} + \frac{l_{1 q}}{ε_{q}} (q - \hat{q}) + ε_{q} l_{3 q} e_{4} \\ {\overset{\cdot}{\hat{d}}}_{q} = \frac{l_{2 q}}{ε_{q}^{2}} (q - \hat{q}) + l_{4 q} e_{4} \end{matrix}

(34)

where $\hat{q}$ and ${\hat{d}}_{q}$ are the states of the disturbance observer; $l_{1 q}$ , $l_{2 q}$ , and $ε_{q}$ are the positive constants related to the observer gain; and $l_{3 q}$ and $l_{4 q}$ are the constants to be specified later.

With the estimated value ${\hat{d}}_{q}$ , the nonlinear term $f_{4, i} + {\hat{d}}_{q} + (1 / τ_{3}) {\tilde{q}}_{r} + 2 e_{3}$ is approximated by an RBF NN that satisfies

\begin{matrix} f_{4, i} + {\hat{d}}_{q} + \frac{1}{τ_{3}} {\tilde{q}}_{r} + 2 e_{3} = W_{2, i}^{T} S_{2} (X_{2}) + δ_{2, i} \\ | δ_{2, i} | ⩽ {\bar{δ}}_{2, i} \end{matrix}

(35)

for any given constant ${\bar{δ}}_{2, i} > 0$ , where $W_{2, i}$ denotes the ideal weight vector, $X_{2} = [V, γ, θ, q, {\hat{d}}_{q}, e_{3}, {\tilde{q}}_{r}]^{T}$ ; $δ_{2, i}$ denotes the approximation error, and it can be verified that

{\overset{\cdot}{e}}_{4} = W_{2, i}^{T} S_{2} (X_{2}) + δ_{2, i} + g_{4, i} δ_{e 1} - e_{3} + d_{q} - {\hat{d}}_{q}

(36)

Since

\begin{matrix} e_{4} (W_{2, i}^{T} S_{2} (X_{2}) + δ_{2, i}) | e_{4} | ⩽ {\bar{Θ}}_{2} (∥ S_{2} (X_{2}) ∥ + 1) \\ | e_{4} | {\bar{Θ}}_{2} Y_{2} (X_{2}) \end{matrix}

(37)

where ${\bar{Θ}}_{2} = \max {∥ W_{2, i} ∥, {\bar{δ}}_{2, i}, i \in S_{M}}$ and $Y_{2} (X_{2}) = ∥ S_{2} (X_{2}) ∥ + 1$ . Applying Lemma 1 to Equation (37) results in

\begin{matrix} e_{4} (W_{2, i}^{T} S_{2} (X_{2}) + δ_{2, i}) {\bar{Θ}}_{2} e_{4} Y_{2} \tan h (\frac{e_{4} Y_{2}}{ϵ_{2}}) \\ + {\bar{Θ}}_{2} χ ϵ_{2} \end{matrix}

(38)

where $ϵ_{2}$ denotes a positive constant. Define $Θ_{2} = g_{4 m}^{- 1} {\bar{Θ}}_{2}$ and choose the Lyapunov function candidate as

L_{V_{4}} = L_{V_{3}} + \frac{1}{2} e_{4}^{2} + \frac{g_{4 m}}{2 r_{2}} {\tilde{Θ}}_{2}^{2}

(39)

where ${\tilde{Θ}}_{2} = {\hat{Θ}}_{2} - Θ_{2}$ , ${\hat{Θ}}_{2}$ denotes the estimation of $Θ_{2}$ , and $r_{2}$ is a positive constant. It follows that

\begin{matrix} {\overset{\cdot}{L}}_{V_{4}} ⩽ {\overset{\cdot}{L}}_{V_{3}} + e_{4} g_{4, i} δ_{e 1} + e_{4} (g_{4 m} Θ_{2} Y_{2} \\ \times \tan h (\frac{e_{4} Y_{2}}{ε_{2}}) - e_{3} + d_{q} - {\hat{d}}_{q}) + \frac{g_{4 m}}{r_{2}} {\tilde{Θ}}_{2} {\overset{\cdot}{\hat{Θ}}}_{2} \\ + {\bar{Θ}}_{2} χ ε_{2} \end{matrix}

(40)

Consequently, the final basic control input $δ_{e 1}$ is

δ_{e 1} = - sign (g_{4 i}) (\frac{k_{4}}{g_{4 m}} e_{4} + {\hat{Θ}}_{2} Y_{2} \tan h (\frac{e_{4} Y_{2}}{ε_{2}}))

(41)

with $k_{4}$ being a positive constant. The update law for ${\hat{Θ}}_{2}$ is defined as

{\overset{\cdot}{\hat{Θ}}}_{2} = r_{2} e_{4} Y_{2} \tan h (\frac{e_{4} Y_{2}}{ε_{2}}) - κ_{2} {\hat{Θ}}_{2}

(42)

where $κ_{2}$ is a positive constant and ${\hat{Θ}}_{2} (0)$ is selected to satisfy ${\hat{Θ}}_{2} ⩽ 0$ . Invoking Equations (41) and (42) entails that

\begin{matrix} {\overset{\cdot}{L}}_{V_{4}} ⩽ {\overset{\cdot}{L}}_{V_{3}} - e_{3} e_{4} + e_{4} (g_{4 m} Θ_{2} Y_{2} \tan h (\frac{e_{4} Y_{2}}{ϵ_{2}}) \\ - \frac{| g_{4, i} |}{g_{4 m}} k_{4} e_{4} - | g_{4, i} | {\hat{Θ}}_{2} Y_{2} \tan h (\frac{e_{4} Y_{2}}{ϵ_{2}})) \\ + \frac{g_{4 m}}{r_{2}} {\tilde{Θ}}_{2} (r_{2} e_{4} Y_{2} \tan h (\frac{e_{4} Y_{2}}{ϵ_{2}}) - κ_{2} {\hat{Θ}}_{2}) \\ + e_{4} (d_{q} - {\hat{d}}_{q}) + {\bar{Θ}}_{2} χ ϵ_{2} \\ ⩽ {\overset{\cdot}{L}}_{V_{3}} - k_{4} e_{4}^{2} - \frac{g_{4 m} κ_{2}}{r_{2}} {\tilde{Θ}}_{2} {\hat{Θ}}_{2} - e_{3} e_{4} + e_{4} (d_{q} - {\hat{d}}_{q}) \\ + {\bar{Θ}}_{2} χ ϵ_{2} \end{matrix}

(43)

Remark 2

It should be pointed out that the nonlinearities including the disturbances of the systems cannot be canceled out directly when designing the common virtual control laws since the systems are of switched type. By the adoption of NNs, the focus becomes that the upper bound of related weight vector which is estimated and adopted to generate the common virtual control laws. Specifically, the updating law of the estimation of the upper bound is to ensure the stability of closed-loop system via the property of inequalities. Such technique is generally adopted for the research on backstepping control of switched nonlinear systems.²¹

Remark 3

We note that there are also several excellent studies on neural learning design. For instance, tracking control of a flexible hypersonic vehicle via a combination of neural approximation and disturbance estimation is addressed in Xu et al.,⁴³ where both the tracking error and prediction error are exploited to generate the weight updating law of NNs. The robust design is integrated into the updating laws for both the weight vector and approximation error of the adopted NNs in Xu⁴⁶ to develop a direct neural controller. The research in Xu et al.^43,46 mainly adopts the general nonlinear systems in lower triangular form and there exists no system switching. As a result, the nonlinearity of the system can be canceled out directly via the employment of NNs as well as disturbance observers. However, the dynamics of morphing aircraft is described by switched nonlinear systems where the nonlinearities cannot be eliminated directly since these nonlinearities switch as the sweep angle of the morphing aircraft varies, which makes our work different from the research in Xu et al.^43,46

For the basic part controller, we have the following theorem:

Theorem 1

Consider the closed-loop switched nonlinear systems formed of the plant (2); the improved disturbance observers (17) and (34); the modified dynamic surfaces (12), (22), and (29); the parameter update laws (26) and (42); and the basic control law (41). Suppose that $δ_{e 2} = 0$ , Assumptions 1–5 are satisfied, and initial values of the closed-loop system are bounded. Then, there exist the design parameters $k_{j} (j = 1, 2, 3, 4)$ , $τ_{j} (j = 1, 2, 3)$ , $l_{j ι}$ , $ε_{ι} (j = 1, 2, 3, 4; ι = γ, q)$ , $r_{j}$ , $ϵ_{j}$ , and $κ_{j} (j = 1, 2)$ such that all signals of the closed-loop system remain bounded, and the altitude tracking error $\tilde{h}$ converges to a neighborhood of the origin with prescribed performance bound.

Proof

Define scaled estimation errors of the designed disturbance observers (17) and (34) as follows

η_{ι} = [η_{ι 1}, η_{ι 2}]^{T}, η_{ι 1} = \frac{ι - \hat{ι}}{ε_{ι}}, η_{ι 2} = d_{ι} - {\hat{d}}_{ι}, ι = γ, q

and the dynamics of the scaled estimation errors $h_{ι}$ then become

{\begin{matrix} ε_{ι} {\overset{\cdot}{η}}_{ι 1} = η_{ι 2} - l_{1 ι} η_{ι 1} - ε_{ι} l_{3 ι} e_{2} \\ ε_{ι} {\overset{\cdot}{η}}_{ι 2} = - l_{2 ι} η_{ι 1} - ε_{ι} l_{4 ι} e_{2} + ε_{ι} {\overset{\cdot}{d}}_{ι}, ι = γ, q \end{matrix}

(44)

Let $A_{ι} = [\begin{matrix} - l_{1 ι} & 1 \\ - l_{2 ι} & 0 \end{matrix}] (ι = γ, q)$ , since $l_{1 ι}$ and $l_{2 ι}$ are positive constants, $A_{ι}$ is a Hurwitz matrix, and then for any ${\bar{k}}_{ι} > 0$ , there exists a positive definite matrix $P_{ι}$ that satisfies $P_{ι} A_{ι} + A_{ι}^{T} P_{ι} = - {\bar{k}}_{ι} I_{2}$ . Using $L_{V_{ι}} = η_{ι}^{T} P_{ι} η_{ι}$ as a Lyapunov function candidate for Equation (44), we obtain

\begin{matrix} {\overset{\cdot}{L}}_{V_{ι}} = - \frac{{\bar{k}}_{ι}}{ε_{ι}} ∥ η_{ι} ∥^{2} - 2 e_{2} [l_{3 ι} l_{4 ι}] P_{ι} η_{ι} + 2 η_{ι}^{T} P_{ι} {[0 {\overset{\cdot}{d}}_{ι}]}^{T} \\ = - \frac{{\bar{k}}_{ι}}{ε_{ι}} ∥ η_{ι} ∥^{2} - 2 (p_{ι 11} l_{3 ι} + p_{ι 12} l_{4 ι}) η_{ι 1} e_{2} \\ - 2 (p_{ι 12} l_{3 ι} + p_{ι 22} l_{4 ι}) η_{ι 2} e_{2} + 2 η_{ι}^{T} P_{ι} {[0 {\overset{\cdot}{d}}_{ι}]}^{T} \end{matrix}

(45)

with $p_{ι j ℓ}$ being the $j$ , $ℓ$ entry of $P_{ι}$ $(j = 1, 2 and ℓ = 1, 2)$ .

The constants $l_{3 ι}$ and $l_{4 ι}$ are chosen to satisfy

{\begin{matrix} p_{ι 11} l_{3 ι} + p_{ι 12} l_{4 ι} = 0, \\ p_{ι 12} l_{3 ι} + p_{ι 22} l_{4 ι} = \frac{1}{2} \end{matrix}

(46)

then the derivative of $L_{V_{ι}}$ along the trajectories of Equation (44) reads

{\overset{\cdot}{L}}_{V_{ι}} = - \frac{{\bar{k}}_{ι}}{ε_{ι}} ∥ η_{ι} ∥^{2} - η_{ι 2} e_{2} + 2 η_{ι}^{T} P_{ι} [0 {\overset{\cdot}{d}}_{ι}]^{T}

(47)

For the closed-loop system, consider now the Lyapunov function candidate

L_{V} = L_{V_{4}} + L_{V_{γ}} + L_{V_{q}}

(48)

Invoking Equations (27), (32), and (43), it can be shown that

\begin{matrix} {\overset{\cdot}{L}}_{V_{4}} ⩽ - \sum_{j = 1}^{4} k_{j} e_{j}^{2} - \frac{1}{τ_{1}} {\tilde{γ}}_{r}^{2} - \frac{1}{τ_{2}} {\tilde{θ}}_{r}^{2} - \frac{1}{τ_{3}} {\tilde{q}}_{r}^{2} - {\tilde{γ}}_{r} B_{1} (\cdot) \\ - {\tilde{θ}}_{r} B_{2} (\cdot) - {\tilde{q}}_{r} B_{3} (\cdot) + 2 g_{2, i} e_{2} e_{3} + e_{2} η_{γ 2} \\ \begin{matrix} + e_{4} η_{q 2} - \frac{g_{2 m} κ_{1}}{r_{1}} {\tilde{Θ}}_{1} {\hat{Θ}}_{1} - \frac{g_{4 m} κ_{2}}{r_{2}} {\tilde{Θ}}_{2} {\hat{Θ}}_{2} \\ + χ ({\bar{Θ}}_{1} ϵ_{1} + {\bar{Θ}}_{2} ϵ_{2}) \end{matrix} \end{matrix}

(49)

By Young’s inequality,⁴⁷ one gets

2 g_{2, i} e_{2} e_{3} ⩽ g_{2 M} (e_{2}^{2} + e_{3}^{2})

(50)

Moreover, with the definition of ${\tilde{Θ}}_{j} (j = 1, 2)$ , we can write

- {\tilde{Θ}}_{j} {\hat{Θ}}_{j} ⩽ - {\tilde{Θ}}_{j}^{2} - {\tilde{Θ}}_{j} Θ_{j} ⩽ - \frac{1}{2} {\tilde{Θ}}_{j}^{2} + \frac{1}{2} Θ_{j}^{2}

(51)

Substituting Equations (50) and (51) into Equation (49) yields

\begin{matrix} {\overset{\cdot}{L}}_{V_{4}} ⩽ - k_{1} e_{1}^{2} - (k_{2} - g_{2 M}) e_{2}^{2} - (k_{3} - g_{2 M}) e_{3}^{2} - k_{4} e_{4}^{2} \\ - \frac{1}{τ_{1}} {\tilde{γ}}_{r}^{2} - \frac{1}{τ_{2}} {\tilde{θ}}_{r}^{2} - \frac{1}{τ_{3}} {\tilde{q}}_{r}^{2} - \frac{g_{2 m} κ_{1}}{2 r_{1}} {\tilde{Θ}}_{1}^{2} - \frac{g_{4 m} κ_{2}}{2 r_{2}} {\tilde{Θ}}_{2}^{2} \\ - {\tilde{γ}}_{r} B_{1} (\cdot) - {\tilde{θ}}_{r} B_{2} (\cdot) - {\tilde{q}}_{r} B_{3} (\cdot) + e_{2} η_{γ 2} + e_{4} η_{q 2} \\ + \frac{g_{2 m} κ_{1}}{r_{1}} Θ_{1}^{2} + \frac{g_{4 m} κ_{2}}{2 r_{2}} Θ_{2}^{2} + χ ({\bar{Θ}}_{1} ϵ_{1} + {\bar{Θ}}_{2} ϵ_{2}) \end{matrix}

(52)

By Assumptions 2 and 3, for given positive constants $ν_{j} (j = 1, 2, \dots, 7)$ , the sets $Φ_{1} = {h_{r}^{2} + {\overset{\cdot}{h}}_{r}^{2} + {\overset{\cdot\cdot}{h}}_{r}^{2} ⩽ ν_{1}}$ , $Φ_{2} = {V^{2} + {\overset{\cdot}{V}}^{2} ⩽ ν_{2}}$ , $Φ_{3} = {ρ^{2} + {\overset{\cdot}{ρ}}^{2} + {\overset{\cdot\cdot}{ρ}}^{2} ⩽ ν_{3}}$ , $Φ_{4} = {\sum_{j = 0}^{4} e_{j}^{2} ⩽ ν_{4}}$ , $Φ_{5} = {{\tilde{γ}}_{r}^{2} + {\tilde{θ}}_{r}^{2} + {\tilde{q}}_{r}^{2} ⩽ ν_{5}}$ , $Φ_{6} = {∥ η_{γ} ∥^{2} + ∥ η_{q} ∥^{2} ⩽ ν_{6}}$ , and $Φ_{7} = {\sum_{j = 1}^{2} {\tilde{Θ}}_{j}^{2} ⩽ ν_{7}}$ are compact. It can then be verified that all arguments of the continuous functions $B_{j} (\cdot) (j = 1, 2, 3)$ are bounded on $Π_{ℓ = 1}^{7} Φ_{ℓ}$ , and there exist related positive constants $M_{j}$ such that $| B_{j} (\cdot) | ⩽ M_{j}$ . Hence

\begin{matrix} {\overset{\cdot}{L}}_{V_{4}} ⩽ - k_{1} e_{1}^{2} - (k_{2} - g_{2 M}) e_{2}^{2} - (k_{3} - g_{2 M}) e_{3}^{2} - k_{4} e_{4}^{2} \\ - (\frac{1}{τ_{1}} - \frac{1}{2}) {\tilde{γ}}_{r}^{2} - (\frac{1}{τ_{2}} - \frac{1}{2}) {\tilde{θ}}_{r}^{2} - (\frac{1}{τ_{3}} - \frac{1}{2}) {\tilde{q}}_{r}^{2} \\ - \frac{g_{2 m} κ_{1}}{2 r_{1}} {\tilde{Θ}}_{1}^{2} - \frac{g_{4 m} κ_{2}}{2 r_{2}} {\tilde{Θ}}_{2}^{2} + e_{2} η_{γ 2} + e_{4} η_{q 2} \\ + \frac{1}{2} \sum_{j = 1}^{3} M_{j}^{2} + \frac{g_{2 m} κ_{1}}{r_{1}} Θ_{1}^{2} + \frac{g_{4 m} κ_{2}}{2 r_{2}} Θ_{2}^{2} \\ + χ ({\bar{Θ}}_{1} ϵ_{1} + {\bar{Θ}}_{2} ϵ_{2}) \end{matrix}

(53)

Note that with $η_{ι} \in Φ_{6}$ , $2 η_{ι}^{T} P_{ι} [0 {\overset{\cdot}{d}}_{ι}]^{T} ⩽ 2 λ_{\max} (P_{ι}) N_{ι} \sqrt{ν_{6}}$ $(ι = γ, q)$ . A combination of Equations (46) and (47) arrives at

\begin{matrix} {\overset{\cdot}{L}}_{V} ⩽ - k_{1} e_{1}^{2} - (k_{2} - g_{2 M}) e_{2}^{2} - (k_{3} - g_{2 M}) e_{3}^{2} - k_{4} e_{4}^{2} \\ - (\frac{1}{τ_{1}} - \frac{1}{2}) {\tilde{γ}}_{r}^{2} - (\frac{1}{τ_{2}} - \frac{1}{2}) {\tilde{θ}}_{r}^{2} - (\frac{1}{τ_{3}} - \frac{1}{2}) {\tilde{q}}_{r}^{2} \\ - \frac{g_{2 m} κ_{1}}{2 r_{1}} {\tilde{Θ}}_{1}^{2} - \frac{g_{4 m} κ_{2}}{2 r_{2}} {\tilde{Θ}}_{2}^{2} - \frac{{\bar{k}}_{γ}}{ε_{γ}} ∥ η_{γ} ∥^{2} - \frac{{\bar{k}}_{q}}{ε_{q}} ∥ η_{q} ∥^{2} \\ + \frac{1}{2} \sum_{j = 1}^{3} M_{j}^{2} + \frac{g_{2 m} κ_{1}}{r_{1}} Θ_{1}^{2} + \frac{g_{4 m} κ_{2}}{2 r_{2}} Θ_{2}^{2} \\ + χ ({\bar{Θ}}_{1} ε_{1} + {\bar{Θ}}_{2} ϵ_{2}) + 2 (λ_{\max} (P_{γ}) N_{γ} \\ + λ_{\max} (P_{q}) N_{q}) \sqrt{ν_{6}} \end{matrix}

(54)

Setting the parameters to satisfy

\begin{matrix} k_{1} > 0, k_{2} > g_{2 M}, k_{3} > g_{2 M}, k_{4} > 0, 0 < τ_{1} < 2, \\ 0 < τ_{2} < 2, 0 < τ_{3} < 2, κ_{1} > 0, κ_{2} > 0, ε_{γ} > 0, ε_{q} > 0 \end{matrix}

(55)

Then

{\overset{\cdot}{L}}_{V} ⩽ - λ_{V} L_{V} + N_{V}

(56)

where $λ_{V} = min {2 k_{1}, 2 (k_{2} - g_{2 M}), 2 (k_{3} - g_{2 M}), 2 k_{4}, (2 / τ_{1}) - 1, (2 / τ_{2}) - 1, (2 / τ_{3}) - 1, κ_{1}, κ_{2}, ({\bar{k}}_{γ} / ε_{γ} λ_{\max (P_{γ})}), ({\bar{k}}_{q} / ε_{q} λ_{\max (P_{q})})}$ , and

\begin{matrix} N_{V} = \frac{1}{2} \sum_{j = 1}^{3} M_{j}^{2} + \frac{g_{2 m} κ_{1}}{r_{1}} Θ_{1}^{2} + \frac{g_{4 m} κ_{2}}{2 r_{2}} Θ_{2}^{2} \\ + χ ({\bar{Θ}}_{1} ϵ_{1} + {\bar{Θ}}_{2} ϵ_{2}) + 2 (λ_{\max} (P_{γ}) N_{γ} \\ + λ_{\max} (P_{q}) N_{q}) \sqrt{ν_{6}} \end{matrix}

Using the comparison principle in Equation (55), we have

L_{V} (t) (L_{V} (0) - \frac{N_{V}}{λ_{V}}) e^{- λ_{V} t} + \frac{N_{V}}{λ_{V}}

(57)

It is straightforward to see that all the signals in the closed-loop system are bounded, and the transformed error $ϑ$ is also bounded. Therefore, it can be concluded the altitude tracking error $\tilde{h}$ converges to a neighborhood of the origin with prescribed performance bound.

Remark 4

It is noteworthy that the improved disturbance observers (17) and (34) and the modified dynamic surfaces (12), (22), and (29) contain the coupling terms related to $e_{j} (j = 1, 2, 3, 4)$ , which constitutes the main difference between our method and those on ESO,^31,32 dynamic surface technique,¹⁰ and the references therein. As can be seen in the proof of Theorem 1, the introduction of coupling terms related to $e_{j} (j = 1, 2, 3, 4)$ is mainly for the sake of canceling out the cross terms in the derivatives of Lyapunov function candidates and facilitates the stability analysis.

Supplementary part design

To expedite the control performance, the reinforcement learning approach is further applied to design the supplementary part controller, where both critic network and action network utilized are RBF NNs. The critic network is mainly for obtaining the estimation of a user-defined cost function, while the action network is mainly for generating the related control strategy which minimizes the estimated value of the cost function. The structure of the reinforcement learning scheme is shown in Figure 2. It is noted that the related error variables defined in the backstepping design are regarded as inputs of both the critic network and action network, and a supplementary control input is generated correspondingly.

Figure 2.

Structure diagram of the reinforcement learning scheme.

The input of the critic network is denoted by $x_{c} (t) = e_{1} (t)$ and the utility function is $φ (x_{c} (t)) = {\bar{k}}_{r} x_{c}^{2} (t)$ , with ${\bar{k}}_{r}$ being a positive constant, and the related cost function is formulated as

J (x_{c} (t)) = \int_{t}^{\infty} e^{- β (s - t)} φ (x_{c} (s)) d s

(58)

where $β$ is a positive constant related to the discount factor. Since the exact value of $J (x_{c} (t))$ cannot be obtained directly, it is estimated by the critic network and then minimized by the action network. By definition, we can regard $J (x_{c} (t))$ as a continuous function of $x_{c} (t)$ . Let $J (x_{c} (t)) = W_{c}^{T} S_{c} (x_{c} (t)) + σ_{c} (t)$ , where $σ_{c} (t)$ is the approximation error. The output of the critic network is the estimation of $J (x_{c} (t))$ , which is denoted by $\hat{J} (x_{c} (t)) = {\hat{W}}_{c}^{T} S_{c} (x_{c} (t))$ .

The error function for training the critic network is then constructed as

e_{c} = φ (x_{c}) - β \hat{J} (x_{c}) + \nabla \hat{J} (x_{c}) {\overset{\cdot}{x}}_{c}

(59)

where $\nabla \hat{J} (x_{c})$ denotes the gradient of $\hat{J} (x_{c})$ with respect to $x_{c}$ . The related objection function for weight updating reads

E_{c} = \frac{1}{2} e_{c}^{2}

(60)

The weight updating of the critic network is to minimize $E_{c}$ in a gradient-descent way, that is

\begin{matrix} {\overset{\cdot}{\hat{W}}}_{c} = - μ_{c} \frac{\partial E_{c}}{\partial {\hat{W}}_{c}} \\ = - μ_{c} e_{c} \frac{\partial}{\partial {\hat{W}}_{c}} (φ (x_{c}) - β \hat{J} (x_{c}) + \nabla \hat{J} (x_{c}) {\overset{\cdot}{x}}_{c}) \\ = - μ_{c} e_{c} \frac{\partial}{\partial {\hat{W}}_{c}} (- β {\hat{W}}_{c}^{T} S_{c} (x_{c}) + {\hat{W}}_{c}^{T} \nabla S_{c} (x_{c}) {\overset{\cdot}{x}}_{c}) \\ = - μ_{c} e_{c} (- β S_{c} (x_{c}) + \nabla S_{c} (x_{c}) {\overset{\cdot}{x}}_{c}) \\ = - μ_{c} e_{c} C \end{matrix}

(61)

where $μ_{c} > 0$ denotes the learning rate, $C = - β S_{c} (x_{c}) + \nabla S_{c} (x_{c}) {\overset{\cdot}{x}}_{c}$ , and $\nabla S_{c} (x_{c})$ denotes the gradient of $S_{c} (x_{c})$ with respect to $x_{c}$ .

The action network takes $x_{a} = [e_{1}, e_{2}, e_{3}, e_{4}]^{T}$ as input and generates the supplementary control input $δ_{e 2}$ , that is, $δ_{e 2} = {\hat{W}}_{a}^{T} S_{a} (x_{a})$ . Let $W_{a}^{T} S_{a} (x_{a}) + σ_{a}$ denote the control input that minimizes the cost function $\hat{J} (x_{c})$ , ${\tilde{W}}_{a} = {\hat{W}}_{a} - W_{a}$ denote the weight error, and $σ_{a}$ be the approximation error. The error function for training the action network is designed as

e_{a} = {\tilde{W}}_{a}^{T} S_{a} (x_{a}) + {\bar{k}}_{a} (\hat{J} (x_{c}) - U_{c})

(62)

where ${\bar{k}}_{a} > 0$ is a design parameter and $U_{c}$ is the desired ultimate cost objective. It can be seen from Equation (57) that we expect $J (x_{c})$ to converge to zero and take $U_{c} = 0$ . Similarly, the gradient-descent algorithm is applied to weight updating of the action network as follows

\begin{matrix} {\overset{\cdot}{\hat{W}}}_{a} = - μ_{a} \frac{\partial E_{a}}{\partial {\hat{W}}_{a}} \\ = - μ_{a} \frac{\partial E_{a}}{\partial e_{a}} \frac{\partial e_{a}}{\partial {\hat{W}}_{a}} \\ = - μ_{a} ({\tilde{W}}_{a}^{T} S_{a} (x_{a}) + {\bar{k}}_{a} \hat{J} (x_{c})) S_{a} (x_{a}) \end{matrix}

(63)

where $E_{a} = (1 / 2) e_{a}^{2}$ and $μ_{a} > 0$ denotes the learning rate. Since ${\tilde{W}}_{a}$ is unavailable, the weight updating law is changed to

\begin{matrix} {\overset{\cdot}{\hat{W}}}_{a} = - μ_{a} ({\hat{W}}_{a}^{T} S_{a} (x_{a}) + {\bar{k}}_{a} \hat{J} (x_{c})) S_{a} (x_{a}) \\ = - μ_{a} ({\hat{W}}_{a}^{T} S_{a} (x_{a}) + {\bar{k}}_{a} {\hat{W}}_{c}^{T} S_{c} (x_{c})) S_{a} (x_{a}) \end{matrix}

(64)

Remark 5

The reinforcement learning method adapts to the uncertain environment without model. Specifically, the reinforcement learning method is a data-driven framework for solving optimal control problems and the requirement of a model of the system dynamics or even the formulation of the reward function can be avoided. However, as stated in Buoniu et al.,³⁴ the model-free property of the reinforcement learning method does not mean that the knowledge on the system dynamics cannot be employed when the model is in fact available, no matter partially or fully. The model of the system dynamics can be exploited to develop reinforcement learning algorithms depending upon the specific conditions, and many related reinforcement learning approaches have been proposed for computing optimal control input of general affine nonlinear systems in theory.^35–37 To summarize, the reinforcement learning method is not restricted to the system model information, and the model can be appropriately adopted once available. In this paper, the reinforcement learning method without model information is devised to further decrease the altitude tracking error since the system model in the presence of the basic part controller is too complex and hence regarded as unknown.

Remark 6

For the supplement part, the RBF NNs are exploited to constitute the critic–action structure of the reinforcement learning algorithm. Both the critic network and action network can be of any appropriate type, such as the multi-layer perceptron structure with one hidden layer,⁴⁰ and the adoption of RBF NNs is mainly for the convenience of weight updating and convergence analysis. The weight updating for the critic network and action network is mainly along the line of reinforcement learning approach in a gradient-descent way. It is noted that the specific choice and structure of the critic network and action network is not the focus of our work. We aim to show that the integration of reinforcement learning technique can further improve the control performance, which is demonstrated in the numerical simulation part.

For the supplementary part controller, we have the following theorem:

Theorem 2

Consider the critic–action NN structure established as Equations (61) and (64). Let ${\tilde{W}}_{c} = {\hat{W}}_{c} - W_{c}$ , and then, the weight estimation errors ${\tilde{W}}_{c}$ and ${\tilde{W}}_{a}$ are uniformly ultimately bounded with the following condition

{\bar{k}}_{a} ⩽ \min {\frac{1}{4}, \frac{μ_{c} {({\tilde{W}}_{c}^{T} C)}^{2}}{2 μ_{a} {({\tilde{W}}_{c}^{T} S_{c} (x_{c}))}^{2}}}

(65)

Proof

By Equation (61), the dynamics of ${\tilde{W}}_{c}$ is formulated as

{\overset{\cdot}{\tilde{W}}}_{c} = - μ_{c} C (φ (x_{c}) - β \hat{J} (x_{c}) + \nabla \hat{J} (x_{c}) {\overset{\cdot}{x}}_{c})

(66)

Notice that

0 = φ (x_{c}) - β J (x_{c}) + \nabla J (x_{c}) {\overset{\cdot}{x}}_{c}

(67)

Using $J (x_{c}) = W_{c}^{T} S_{c} (x_{c}) + σ_{c}$ in Equation (67) yields

\begin{matrix} φ (x_{c}) = β (W_{c}^{T} S_{c} (x_{c}) + σ_{c}) - \nabla (W_{c}^{T} S_{c} (x_{c})) {\overset{\cdot}{x}}_{c} \\ = W_{c}^{T} (β S_{c} (x_{c}) - \nabla S_{c} (x_{c}) {\overset{\cdot}{x}}_{c}) + β σ_{c} \\ = - W_{c}^{T} C + β σ_{c} \end{matrix}

(68)

Substituting Equation (68) into Equation (66) delivers

{\overset{\cdot}{\tilde{W}}}_{c} = - μ_{c} C C^{T} {\tilde{W}}_{c} - β μ_{c} σ_{c} C

(69)

Moreover, the dynamics of ${\tilde{W}}_{a}$ is given by

{\overset{\cdot}{\tilde{W}}}_{a} = - μ_{a} ({\hat{W}}_{a}^{T} S_{a} (x_{a}) + {\bar{k}}_{a} {\hat{W}}_{c}^{T} S_{c} (x_{c})) S_{a} (x_{a})

(70)

Choosing the Lyapunov function candidate as

L_{ac} = \frac{1}{2} {\tilde{W}}_{c}^{T} {\tilde{W}}_{c} + \frac{1}{2} {\tilde{W}}_{a}^{T} {\tilde{W}}_{a}

(71)

the derivative of $L_{ac}$ along the trajectories of Equations (69) and (70) becomes

\begin{matrix} {\overset{\cdot}{L}}_{ac} = {\tilde{W}}_{c}^{T} {\overset{\cdot}{\tilde{W}}}_{c} + {\tilde{W}}_{a}^{T} {\overset{\cdot}{\tilde{W}}}_{a} \\ = - μ_{c} {\tilde{W}}_{c}^{T} (C C^{T} {\tilde{W}}_{c} + β σ_{c} C) \\ - μ_{a} {\tilde{W}}_{a}^{T} ({\hat{W}}_{a}^{T} S_{a} (x_{a}) + {\bar{k}}_{a} {\hat{W}}_{c}^{T} S_{c} (x_{c})) S_{a} (x_{a}) \\ = - μ_{c} ({\tilde{W}}_{c}^{T} C)^{2} - β μ_{c} σ_{c} {\tilde{W}}_{c}^{T} C - μ_{a} ({\tilde{W}}_{a}^{T} S_{a} (x_{a}))^{2} \\ - μ_{a} ({\tilde{W}}_{a}^{T} S_{a} (x_{a})) (W_{a}^{T} S_{a} (x_{a})) - μ_{a} {\bar{k}}_{a} ({\tilde{W}}_{a}^{T} S_{a} (x_{a})) \\ \times ({\tilde{W}}_{c}^{T} S_{c} (x_{c})) - μ_{a} {\bar{k}}_{a} ({\tilde{W}}_{a}^{T} S_{a} (x_{a})) (W_{c}^{T} S_{c} (x_{c})) \end{matrix}

(72)

By Young’s inequality,⁴⁷ it follows that

\begin{matrix} {\overset{\cdot}{L}}_{ac} ⩽ - \frac{μ_{c}}{2} ({\tilde{W}}_{c}^{T} C + β σ_{c})^{2} \\ - \frac{1}{2} (μ_{c} {({\tilde{W}}_{c}^{T} C)}^{2} - μ_{a} {\bar{k}}_{a} {({\tilde{W}}_{c}^{T} S_{c} (x_{c}))}^{2}) \\ - \frac{μ_{a} (1 - 2 {\bar{k}}_{a})}{2} ({\tilde{W}}_{a}^{T} S_{a} (x_{a}))^{2} \\ + \frac{1}{2} (μ_{c} β^{2} {\bar{σ}}_{c}^{2} + μ_{a} {(W_{a}^{T} S_{a} (x_{a}))}^{2} \\ + μ_{a} {\bar{k}}_{a} {(W_{c}^{T} S_{c} (x_{c}))}^{2}) \end{matrix}

(73)

where ${\bar{σ}}_{c}$ denotes the upper bound of $σ_{c}$ . To show that ${\tilde{W}}_{c}$ and ${\tilde{W}}_{a}$ are bounded, we consider the following two cases:

Case 1. ${\tilde{W}}_{c}^{T} S_{c} (x_{c}) \neq 0$ . In this case, by setting ${\bar{k}}_{a} ⩽ \min {1 / 4, (μ_{c} ({\tilde{W}}_{c}^{T} C)^{2} / 2 μ_{a} ({\tilde{W}}_{c}^{T} S_{c} (x_{c}))^{2})}$ , we can verify that

\begin{matrix} {\overset{\cdot}{L}}_{ac} ⩽ - \frac{μ_{c}}{4} ({\tilde{W}}_{c}^{T} C)^{2} - \frac{μ_{a}}{4} ({\tilde{W}}_{a}^{T} S_{a} (x_{a}))^{2} + ϒ \\ ⩽ - \frac{μ_{c}}{4} ({\tilde{W}}_{c}^{T} C)^{2} - \frac{μ_{a}}{4} ({\tilde{W}}_{a}^{T} S_{a} (x_{a}))^{2} + ϒ_{\max} \end{matrix}

(74)

where $ϒ = (1 / 2) (μ_{c} β^{2} {\bar{σ}}_{c}^{2} + μ_{a} (W_{a}^{T} S_{a} (x_{a}))^{2} + μ_{a} {\bar{k}}_{a} (W_{c}^{T} S_{c} (x_{c}))^{2})$ and $ϒ_{\max}$ is a constant that satisfies $| ϒ | ⩽ ϒ_{\max}$ .

Case 2. ${\tilde{W}}_{c}^{T} S_{c} (x_{c}) = 0$ . In this case, by setting ${\bar{k}}_{a} ⩽ (1 / 4)$ , we have

{\overset{\cdot}{L}}_{ac} ⩽ - \frac{μ_{c}}{2} ({\tilde{W}}_{c}^{T} C)^{2} - \frac{μ_{a}}{4} ({\tilde{W}}_{a}^{T} S_{a} (x_{a}))^{2} + ϒ_{\max}

(75)

Summarizing the two cases, it can be verified that ${\overset{\cdot}{L}}_{ac} < 0$ if $∥ {\tilde{W}}_{c} ∥ > (4 ϒ_{\max} / λ_{\min} (C C^{T}))$ or $∥ {\tilde{W}}_{a} ∥ > (4 ϒ_{\max} / λ_{\min} (S_{a} (x_{a}) S_{a}^{T} (x_{a})))$ . A combination of the standard Lyapunov extension theorem⁴⁸ and definition of $L_{ac}$ entails that the weight estimation errors ${\tilde{W}}_{c}$ and ${\tilde{W}}_{a}$ are uniformly ultimately bounded, which completes the proof of Theorem 2.

Remark 7

The optimal cost function satisfies the Bellman equation according to optimal control theory, which has been addressed for general nonlinear systems in Buoniu et al.³⁴ and the references therein. Since the exact solution of the Bellman equation is difficult to obtain, the NNs are generally adopted in the reinforcement learning technique to generate the approximate solution. It is noteworthy that the recursive relationship of the Bellman equation is critical for constructing the weight updating laws of the NNs, as can be seen from Equations (58) and (66).

Remark 8

We note that the stability analysis in the proof of Theorem 2 is mainly on the boundedness of weight estimation errors ${\tilde{W}}_{c}$ and ${\tilde{W}}_{a}$ . The convergence of the reinforcement learning method is generally presented by proving the boundedness of the weight estimation errors as done in Modares et al.³⁵ and Mu et al.⁴⁰ The stability proof of the closed-loop system with both $δ_{e 1}$ and $δ_{e 2}$ is very complicated since $δ_{e 1}$ and $δ_{e 2}$ are generated within different frameworks. We handle this issue in a similar way with the method in Mu et al.,⁴⁰ and this problem is worthy of further study.

Remark 9

We note that the prescribed performance technique adopted in this paper follows from Bechlioulis and Rovithakis,²⁴ and many researchers have addressed prescribed performance control in recent years.^49–51 We do not deliberately emphasize on the specific type of prescribed performance control method. The focus of our work is a union of the prescribed performance control and the reinforcement learning approach, which is achieved by the basic part and supplementary part controllers, respectively. Nevertheless, we approve that the specific design of the prescribed performance control method is important to the final control performance, which will be the topic of our further research.

Numerical simulation

The effectiveness of the proposed control scheme is illustrated by a comparative simulation study with the method in Wu et al.,¹⁰ where the morphing aircraft model parameters are mainly taken from Wu et al.⁹ and 20% uncertainties of aerodynamic coefficients are considered. Besides, we set $Δ d_{γ} = 2 \sin (0.1 t) e^{- 0.1 t} / (° / s)$ and $Δ d_{q} = 10 \sin (0.2 t) e^{- 0.05 t} / (° / s^{2})$ . The initial values of the states in Equation (2) are set as $[h_{0}, γ_{0}, α_{0}, q_{0}] = [1000 m, 0 °, 0.4976 °, 0 ° / s]$ . The reference altitude $h_{r}$ increases from $h_{0}$ to 1100 m through a transfer function $0.04 / (s^{2} + 0.4 s + 0.04)$ , and the sweep angle $ξ$ increases from $0 °$ to $45 °$ through a transfer function $0.01 / (s^{2} + 0.2 s + 0.01)$ . It is assumed that the system switching occurs every $5 °$ within the $[0 °, 45 °]$ range. The prescribed performance parameters are selected as $\underline{ς} = 1$ , $\bar{ς} = 1$ , $ρ_{0} = 10$ , $ρ_{\infty} = 2$ , and $a = 0.1$ . For the basic part controller, $k_{1} = 2$ , $k_{2} = 4$ , $k_{3} = 4$ , $k_{4} = 6$ , $τ_{1} = 0.05$ , $τ_{2} = 0.05$ , and $τ_{3} = 0.05$ . The ESO parameters are taken as $l_{1 ι} = 2$ , $l_{2 ι} = 1$ , $ε_{ι} = 0.05$ , ${\bar{k}}_{ι} = 1$ , and $ι = γ, q$ . The RBF NN $W_{1, i}^{T} S_{1} (X_{1})$ has 100 nodes with the centers evenly spaced on $[25, 30] \times [- 0.3, 0.3] \times [- 1, 1] \times [- 0.3, 0.3] \times [0, 10]$ and the width is 1. The RBF NN $W_{2, i}^{T} S_{2} (X_{2})$ has 150 nodes with the centers evenly spaced on $[25, 30] \times [- 0.3, 0.3] \times [- 0.3, 0.3] \times [- 0.3, 0.3] \times [- 5, 5] \times [- 0.3, 0.3] \times [- 0.3, 0.3]$ , and the width is 1. The parameters for estimating $Θ_{1}$ and $Θ_{2}$ are set as $r_{1} = 5$ , $κ_{1} = 0.02$ , $ε_{1} = 0.5$ , $r_{2} = 0.3$ , $κ_{2} = 1$ , and $ε_{2} = 0.5$ , and the initial values are ${\hat{Θ}}_{1} (0) = 0$ and ${\hat{Θ}}_{2} (0) = 0$ .

For the reinforcement learning–based supplementary part, the critic network has 30 nodes with the centers evenly spaced on $[- 0.3, 0.3]$ and a width of 0.5, while the action network has 50 nodes with the centers evenly spaced on $[- 0.3, 0.3]$ and a width of 0.5. The other related parameters are taken as $β = 0.1$ , ${\bar{k}}_{r} = 10$ , $μ_{c} = 0.5$ , ${\bar{k}}_{a} = 0.5$ , and $μ_{a} = 0.5$ . The initial values of both weights of the critic network and action network are set to zero, and the exploration noise is added in the first 3 s in the control input to enhance state exploration. The simulation results are shown in Figures 3 –10, where the basic part control method is denoted by m1, the combination of basic part and supplementary part control method is denoted by m1 + RL, and the method in Wu et al.¹⁰ is denoted by m2.

Figure 3.

Altitude tracking.

Figure 4.

Altitude tracking error.

Figure 5.

Altitude tracking error (m1 and m1 + RL).

Figure 6.

The control input $δ_{e}$ .

Figure 7.

The response of ${\hat{Θ}}_{1}$ .

Figure 8.

The response of ${\hat{Θ}}_{2}$ .

Figure 9.

The responses of $d_{γ}$ and ${\hat{d}}_{γ}$ (m1).

Figure 10.

The response of $∥ {\hat{W}}_{c} ∥$ .

The altitude tracking responses of all three methods are depicted in Figures 3 –5, and the boundedness of altitude tracking error can be guaranteed. However, the m2 method cannot ensure the prescribed performance bound, while the m1 and m1 + RL method can achieve the control goal. Besides, it is easy to see from Figure 5 that compared with the m1 method, the m1 + RL method has better transient performance and smaller steady error, which illustrates the effectiveness of the reinforcement learning.

The related control inputs are given in Figure 6, which are all less than $15 °$ . It is shown that the related state variables are also bounded. Both the state variables and the control inputs of all three methods are of the same order. The responses of ${\hat{Θ}}_{1}$ and ${\hat{Θ}}_{2}$ are revealed in Figures 7 and 8, and they are bounded both for the m1 and m1 + RL method. The disturbance estimations in the $γ$ and $q$ loops can track the related disturbances with small errors even when the switching occurs for both the m1 method and m1 + RL method, which is depicted in Figure 9. In addition, the responses of $∥ {\hat{W}}_{c} ∥$ and $∥ {\hat{W}}_{a} ∥$ of the m1 + RL method are shown in Figures 10 and 11, and it can be seen that $∥ {\hat{W}}_{c} ∥$ and $∥ {\hat{W}}_{a} ∥$ stay bounded during the simulation.

Figure 11.

The response of $∥ {\hat{W}}_{c} ∥$ .

Remark 10

We have shown the advances of the work on reinforcement learning by comparative simulation studies. It is shown by the simulation results that the m1 method outperforms the m2 method in terms of prescribed performance of the altitude tracking, which indicates the advantage of the backstepping design via improved disturbance observer and dynamics surface technique. Furthermore, both the m1 method and m1 + RL method can ensure the prescribed performance bound, while the m1 + RL method has smaller tracking error when compared with the m1 method. Hence, the integration of reinforcement learning into the controller design can further improve control performance. It is noteworthy that the design philosophy of artificial intelligence community differs from that of control community although both of them contribute to the development of reinforcement learning greatly.³⁴ The former cares about the convergence of the learning algorithm to the optimal solution, while the stability during the learning process is neglected. The latter focuses on the closed-loop stability, and the convergence to the optimal solution is not a priority. As a result, most of the extant reinforcement learning methods cannot be applied to systems with disturbances directly. For our work, it can be considered that the basic part controller is designed to provide a stable environment for the reinforcement learning scheme.

Conclusion

This paper addresses the prescribed performance control of the longitudinal altitude of a morphing aircraft based on switched nonlinear systems and reinforcement learning. Switched nonlinear systems in lower triangular form are first derived to describe the morphing aircraft dynamics, and the prescribed performance bound of the altitude tracking is considered by error transformation. Then, the controller is designed with integration of two parts: the basic part and supplementary part. The backstepping scheme is adopted to devise the basic part, and the modified dynamic surface control technique is involved to avoid the “explosion of complexity” problem. The common control laws are obtained by uniting improved disturbance observers and RBF NNs with consideration of the disturbances of the systems. Besides, a critic–action NN structure of the reinforcement learning is applied to develop the supplementary part controller via the error variables defined in the backstepping design. Comparative simulations clearly show that the altitude tracking error satisfies the prescribed performance bound by the proposed control method. In particular, the integration of reinforcement learning into the controller design can further improve control performance when compared with the controller which contains the basic part only.

Footnotes

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive comments on improving this paper.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

This study was supported by the National Natural Science Foundation of China (Grant Nos 61873295 and 61833016), the Aeronautical Science Foundation of China (Grant No. 2016ZA51011), and the Shanghai Aerospace Science and Technology Innovation Foundation (Grant No. SAST2017-096).

ORCID iD

Ligang Gong

References

Afonso

Vale

Lau

et al . Performance based multidisciplinary design optimization of morphing aircraft. Aerosp Sci Technol 2017; 67: 1–12.

Sun

Guan

Liu

et al . Morphing aircraft based on smart materials and structures: a state-of-the-art review. J Intel Mat Syst Str 2016; 27: 2289–2312.

Weisshaar

TA.

Morphing aircraft systems: historical perspectives and future challenges. J Aircraft 2013; 50: 337–353.

Ajaj

Beaverstock

Friswell

MI.

Morphing aircraft: the need for a new design philosophy. Aerosp Sci Technol 2016; 49: 154–166.

Kammegne

MJT

Botez

Grigorie

et al . Proportional fuzzy feed-forward architecture control validation by wind tunnel tests of a morphing wing. Chinese J Aeronaut 2017; 30: 561–576.

Yue

Wang

Gain self-scheduled H_∞ control for morphing aircraft in the wing transition process based on an LPV model. Chinese J Aeronaut 2013; 26: 909–917.

Yin

YP.

Tensor product model-based control of morphing aircraft in transition process. Proc IMechE, Part G: J Aerospace Engineering 2016; 230: 378–391.

Wen

Liu

Zhu

Linear-parameter-varying-based adaptive sliding mode control with bounded L₂ gain performance for a morphing aircraft. Proc IMechE, Part G: J Aerospace Engineering. Epub ahead of print 22 March 2018. DOI: 10.1177/0954410018764472.

Rajput

et al . Adaptive neural control based on high order integral chained differentiator for morphing aircraft. Math Probl Eng 2015; 2015: 787931.

10.

Zhou

et al . Modified adaptive neural dynamic surface control for morphing aircraft with input and output constraints. Nonlinear Dynam 2017; 87: 2367–2383.

11.

Shi

et al . Robust adaptive neural control of morphing aircraft with prescribed performance. Math Probl Eng 2017; 2017: 1401427.

12.

Yan

Sun

et al . Adaptive neural network control of a flapping wing micro aerial vehicle with disturbance observer. IEEE T Cybernetics 2017; 47: 3452–3465.

13.

Wang

Dong

Wang

Finite-time boundedness control of morphing aircraft based on switched systems approach. Optik 2015; 126: 4436–4445.

14.

Jiang

Dong

Wang

A systematic method of smooth switching LPV controllers design for a morphing aircraft. Chinese J Aeronaut 2015; 28: 1640–1649.

15.

Cheng

Dong

Jiang

et al . Non-fragile switched H_∞ control for morphing aircraft with asynchronous switching. Chinese J Aeronaut 2017; 30: 1127–1139.

16.

Cheng

Dong

et al . Asynchronously finite-time H_∞ control for morphing aircraft. T I Meas Control 2018; 40: 4330–4344.

17.

Zhao

Zheng

Niu

et al . Adaptive tracking control for a class of uncertain switched nonlinear systems. Automatica 2015; 52: 185–191.

18.

Niu

Karimi

Wang

et al . Adaptive output-feedback controller design for switched nonlinear stochastic systems with a modified average dwell-time method. IEEE T Syst Man Cy: S 2017; 47: 1371–1382.

19.

Yang

Jiang

Tao

et al . Robust stability of switched nonlinear systems with switching uncertainties. IEEE T Automat Contr 2016; 61: 2531–2537.

20.

Zhai

Dong

et al . Adaptive tracking control for a class of switched nonlinear systems under asynchronous switching. IEEE T Fuzzy Syst 2017; 26: 1245–1256.

21.

Liu

Chen

Lin

Adaptive neural backstepping for a class of switched nonlinear system without strict-feedback form. IEEE T Syst Man Cy: S 2017; 47: 1315–1320.

22.

Bechlioulis

Rovithakis

GA.

Robust adaptive control of feedback linearizable MIMO nonlinear systems with prescribed performance. IEEE T Automat Contr 2008; 53: 2090–2099.

23.

Bechlioulis

Rovithakis

GA.

Prescribed performance adaptive control for multi-input multi-output affine in the control nonlinear systems. IEEE T Automat Contr 2010; 55: 1220–1226.

24.

Bechlioulis

Rovithakis

GA.

A low-complexity global approximation-free control scheme with prescribed performance for unknown pure feedback systems. Automatica 2014; 50: 1217–1226.

25.

Theodorakopoulos

Rovithakis

GA.

Low-complexity prescribed performance control of uncertain MIMO feedback linearizable systems. IEEE T Automat Contr 2016; 61: 1946–1952.

26.

Zhang

Yang

GH.

Prescribed performance fault-tolerant control of uncertain nonlinear systems with unknown control directions. IEEE T Automat Contr 2017; 62: 6529–6535.

27.

Wei

Luo

Yin

et al . Robust estimation-free decentralized prescribed performance control of nonaffine nonlinear large-scale systems. Int J Robust Nonlin 2018; 28: 174–196.

28.

Tong

Liu

et al . Adaptive output-feedback control design with prescribed performance for switched nonlinear systems. Automatica 2017; 80: 225–231.

29.

Xiang

Adaptive prescribed performance control for switched nonlinear systems with input saturation. Int J Syst Sci 2018; 49: 113–123.

30.

Zhai

et al . Prescribed performance switched adaptive dynamic surface control of switched nonlinear systems with average dwell time. IEEE T Syst Man Cy: S 2017; 47: 1257–1269.

31.

Guo

Zhao

ZL.

On the convergence of an extended state observer for nonlinear systems with uncertainty. Syst Control Lett 2011; 60: 420–430.

32.

Zhao

Guo

BZ.

Extended state observer for uncertain lower triangular nonlinear systems. Syst Control Lett 2015; 85: 100–108.

33.

Kiumarsi

Vamvoudakis

Modares

et al . Optimal and autonomous control using reinforcement learning: a survey. IEEE T Neur Net Lear 2018; 29: 2042–2062.

34.

Buoniu

de Bruin

Tolić

et al . Reinforcement learning for control: performance, stability, and deep approximators. Annu Rev Control 2018; 46: 8–28.

35.

Modares

Lewis

Naghibi-Sistani

MB.

Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 2014; 50: 193–202.

36.

Luo

Huang

Off-policy reinforcement learning for H_∞ control design. IEEE T Cybernetics 2015; 45: 65–76.

37.

Wei

Song

Yan

Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE T Neur Net Lear 2016; 27: 444–458.

38.

Sun

et al . Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems. IEEE T Cybernetics 2017; 47: 1460–1470.

39.

Ouyang

Reinforcement learning control of a single-link flexible robotic manipulator. IET Control Theory A 2017; 11: 1426–1433.

40.

Sun

et al . Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE T Neur Net Lear 2017; 28: 584–598.

41.

Cui

Yang

et al . Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning. IEEE T Syst Man Cy: S 2017; 47: 1019–1029.

42.

Polycarpou

Ioannou

PA.

A robust adaptive nonlinear control design. Automatica 1996; 32: 423–427.

43.

Wang

Zhang

et al . DOB based neural control of flexible hypersonic flight vehicle considering wind effects. IEEE T Ind Electron 2017; 64: 8676–8685.

44.

Meng

Adaptive backstepping control for air-breathing hypersonic vehicle with actuator dynamics. Aerosp Sci Technol 2017; 67: 412–421.

45.

Liu

Wang

et al . Disturbance observer-based antiwindup control for air-breathing hypersonic vehicles. IEEE T Ind Electron 2016; 63: 3038–3049.

46.

Robust adaptive neural control of flexible hypersonic flight vehicle with dead-zone input nonlinearity. Nonlinear Dynam 2015; 80: 1509–1520.

47.

Krstic

Kanellakopoulos

Kokotovic

PV.

Nonlinear and adaptive control design. New York: Wiley, 1995.

48.

Lewis

Jagannathan

Yesildirak

Neural network control of robot manipulators and nonlinear systems. London: Taylor & Francis, 1999.

49.

Guan

Zheng

et al . Prescribed performance control for automatic carrier landing with disturbance. Nonlinear Dynam 2018; 94: 1335–1349.

50.

Wei

A new prescribed performance control approach for uncertain nonlinear dynamic systems via back-stepping. J Frankl Inst 2018; 355: 8510–8536.

51.

Sun

Zhu

Robust approximation-free prescribed performance control for nonlinear systems and its application. Int J Syst Sci 2018; 49: 511–522.

Prescribed performance control of morphing aircraft based on switched nonlinear systems and reinforcement learning

Abstract

Keywords

Introduction

Notations

Model description

Assumption 1

Lemma 1

Controller design

Basic part design

Assumption 2

Assumption 3

Assumption 4

Assumption 5

Remark 1

Remark 2

Remark 3

Theorem 1

Proof

Remark 4

Supplementary part design

Remark 5

Remark 6

Theorem 2

Proof

Remark 7

Remark 8

Remark 9

Numerical simulation

Remark 10

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References