Sage Journals: Discover world-class research

Abstract

In this study, an optimal adaptive control approach is established to solve the robust output tracking problem of a class of continuous time uncertain linear systems based on the policy iteration (PI) in actor-critic algorithm. First, by augmenting the integral variables of the tracking error into state variables, the robust tracking problem is transformed into a robust control problem of an augmented uncertain linear system. It is proven that the robust control law of the augmented system enables the output of the considered system to track a polynomial time signal asymptotically. Second, an optimal control method in the corresponding auxiliary nominal system is established, and based on the Bellman optimality principle, PI algorithms are proposed to solve online tracking controllers for the matched and the mismatched uncertain systems. Finally, for testing the availability of the proposed approach and theoretical results, two numerical experiments are provided.

Keywords

Policy iteration uncertain linear system robust tracking optimal control

Introduction

Control systems in many practical problems have uncertainties, owing to data errors or disturbances.¹ Therefore, study on the robust control problem of uncertain systems has garnered significant attention. An adaptive optimal control method to design a robust tracking controller based on the policy iteration (PI) of the actor-critic algorithm is introduced in the research of algorithm design. Adaptive optimal control is a control method that automatically adjusts controller’s parameters to optimize the performance of the control system according to the change of parameters or environment. Moreover, it is widely used in control practice.²

Since the 1980s, the robust control problem of practical uncertain systems has been a research hotspot. Algebraic Riccati equations (ARE) were utilized to propose a robust control design for some uncertain systems.^3,4 The robust control of time-varying uncertain linear systems was studied,⁵ and the controller parameters were obtained using a one-dimensional search method. The proposed method could be effective if the system were to have satisfied the matching condition. Using a noniterative method, a robust controller was obtained for a matched system.⁶ The need for pre-compensators of the unstable nominal system was a disadvantage of this design method. In addition, other effective robust control-design methods were proposed for general matched uncertain systems.^7,8 However, only a few study results are available for mismatched systems having an uncertainty in the input matrix.

The robust tracking problem has been investigated in many studies.^9–12,32,34 Schmitendorf et al. proposed a robust tracking control approach for time-invariant uncertain system with unknown constant disturbances.⁹ Benson and Schmitendorf considered a robust tracking problem by using observer-based augmented system.¹⁰ Utilizing an improved linear quadratic optimal control method, Shieh et al. designed a robust tracking law for an uncertain linear systems.¹¹ Solving a robust tracking problem for an uncertain system was turned into an optimal control problem.¹² For linear systems, it is a linear regulator problem, which is solved using an ARE.

Furthermore, the PI in reinforcement learning (RL) has been utilized extensively in solving control and tracking problems for deterministic systems. In terms of feedback control, we can refer to the literature.^13–15 There are also some literatures on using PI algorithm to solve tracking problem. An online RL was established to obtain a linear tracking controller in partially unknown linear systems with unstable tracking signals.¹⁶ By constructing an augmented system comprising the state variable and tracking signal, the tracking problem was turned into an optimal control problem. Therefore, the optimal tracking cost is quadratic as well. An adaptive dynamic programming (ADP) algorithm was proposed to obtain the tracking controller for completely unknown linear systems,¹⁷ where tracking signals were generated for stabilizing the systems. Moreover, RL and ADP techniques were employed to achieve an optimal output regulation for linear systems.¹⁸

On the other hand, the application of RL in the robust control of uncertain systems has been proposed by many researchers.^19–26 Robust tracking problems exist widely in practical applications. However, to our knowledge, only a few studies were conducted on the PI-based robust tracking of uncertain linear systems. Most existing design methods for solving robust tracking problems are based on impractical nominal systems. Therefore, it is necessary to use the PI algorithm to solve the robust tracking problems of uncertain linear systems with an unknown nominal system matrix. In this study, a PI method was developed to solve the robust tracking control of an uncertain linear system with a polynomial tracking signal. The robust tracking problem was turned into a robust control problem of an augmented linear system. Additionally, online PI algorithms were developed to solve the considered tracking design problem for matched and mismatched systems.

Our main contributions are as follows. First, we considered the tracking problem of general uncertain linear systems in which both the state matrix and the input matrix were uncertain. In the existing literature,¹² the design of a robust tracking-control method was proposed for uncertain linear systems only when an uncertainty entered the system matrix. The existing results were extended to a case where the input matrix was uncertain. Second, online PI algorithms were presented to solve tracking problems for the matched uncertain system and the mismatched uncertain systems. Because it is practically difficult to obtain the nominal system-matrix information accurately, using the PI algorithm is advantageous. As a result, we extended the PI algorithm to calculate robust tracking control law for the general uncertain linear systems.

The rest of this paper is arranged as follows. We formulate the robust tracking problems and propose some basic results for the issues under consideration in Section 2. Solving robust tracking problem is converted to calculate a robust control law for augmented uncertain systems. In Sections 3 and 4, the robust tracking problem for a matched and mismatched linear system is solved by transforming it into a robust control problem of an augmented system. Online PI algorithms are developed for an augmented uncertain system based on the optimal control of an auxiliary linear system. To support the proposed theoretical framework, we provide numerical experiments with two examples in Section 5. In Section 6, the study is concluded, and the scope for future research is discussed.

Robust tracking control framework

Consider an continuous-time linear system with uncertainty as follows:

\begin{matrix} \overset{\cdot}{x} = A (s) x + B (l) u \\ y = Cx, \end{matrix}

(1)

where $x \in R^{n}$ is the state vector, $u \in R^{m}$ is the input variable, $y \in R^{1}$ is the system output, $s \in S, l \in L$ are the uncertain parameter vectors, and $S$ and $L$ are sets of uncertain parameters. $A (s)$ is $n \times n$ uncertain state parameter matrix, $B (l)$ is $n \times m$ uncertain input parameter matrix, and C is an $1 \times n$ constant output matrix.

The objective of control design is to establish a control input, $u = Kx$ , such that the system output y asymptotically tracks the desired referenced signal, $y_{r}$ , for all $s \in S$ and $l \in L$ . The referenced signal is assumed to be a polynomial time signal, $y_{r} = a_{0} + a_{1} t + \dots + a_{d - 1} t^{d - 1}$ , where $a_{0}, a_{1}, \dots, a_{d - 1}$ are constants, and d is a nonnegative integer. In particular, the control design goal is to establish a control input, $u = Kx$ , such that the closed-loop uncertain system (1) is asymptotically stable and the output, $y = Cx$ , can asymptotically track the signal, $y_{r}$ , for all $s \in S$ and $l \in L$ .

Some definitions and basic assumptions^1,27 are elaborated as follows.

Assumption 1. Nominal values $s_{0} \in S$ and $l_{0} \in L$ , such that $(A (s_{0}), B (l_{0}))$ is stabilizable and $(A (s_{0}), C)$ observable.

Definition 1. System (1) satisfies matched condition in system matrix if, for every $s \in S$ , there is a matrix, $φ (s)$ , such that

A (s) - A (s_{0}) = B (l_{0}) φ (s),

(2)

where $φ (s) \in R^{m \times n}$ .

Definition 2. System (1) satisfies the matched condition in input matrix if, for every $l \in L$ , there is a matrix $\bar{φ} (l)$ , such that

B (l) - B (l_{0}) = B (l_{0}) \bar{φ} (l),

(3)

where $\bar{φ} (l) \in R^{m \times m}$ , and $\bar{φ} (l) \geq 0$ .

Definition 3. When system (1) satisfies conditions (2) and (3), for all $s \in S$ and $l \in L$ , it is called a matched uncertain linear system.

Assumption 2. For the linear system (1) with matching uncertainties satisfying conditions (2) and (3), there is a known matrix, M, such that $φ^{T} (s) φ (s) \leq M \geq 0$ for every $s \in S$ .

If system (1) does not satisfy matching conditions (2) and (3), the pseudo inverse, $B (l_{0})^{+}$ , of nominal input matrix $B (l_{0})$ is introduced to decompose the uncertain system matrix and input the matrix into the following matched and mismatched parts.

\begin{matrix} A (s) - A (s_{0}) = B (l_{0}) B (l_{0})^{+} [A (s) - A (s_{0})] \\ + [I - B (l_{0}) B {(l_{0})}^{+}] [A (s) - A (s_{0})], \end{matrix}

(4)

and

\begin{matrix} B (l) - B (l_{0}) = B (l_{0}) B (l_{0})^{+} [B (l) - B (l_{0})] \\ + [I - B (l_{0}) B {(l_{0})}^{+}] [B (l) - B (l_{0})], \end{matrix}

(5)

where $B (l_{0})^{+} = {[B^{T} (l_{0}) B (l_{0})]}^{- 1} B^{T} (l_{0})$ .

If system (1) is a mismatched uncertain system, let the uncertainty of the system be subject to the following assumptions.

Assumption 3. A positive semidefinite matrix, F, exists, such that

{[A (s) - A (s_{0})]}^{T} [B (l_{0})^{+}]^{T} B (l_{0})^{+} [A (s) - A (s_{0})] \leq F .

(6)

Assumption 4. A positive semidefinite matrix, H, exists, such that

{[A (s) - A (s_{0})]}^{T} [A (s) - A (s_{0})] \leq H .

(7)

Assumption 5. A positive semidefinite matrix, G, exists, such that

{[A (s) - A (s_{0})]}^{T} C^{T} C [A (s) - A (s_{0})] \leq G .

The above assumptions are common in the study of uncertain systems^1,26,27.

To solve the robust tracking problem, we introduce a new variable, $e = y - y_{r}$ , as an error. Based on the integrals of the error variable, the following new state variables are defined:

{\overset{\cdot}{q}}_{1} = e, {\overset{\cdot}{q}}_{2} = q_{1}, \dots, {\overset{\cdot}{q}}_{d} = q_{d - 1}

Now, define the augmented system state as

X (t) = {[\begin{matrix} x^{T} (t) & q_{1} & q_{2} & \dots & q_{d} \end{matrix}]}^{T}

Using the augmented system state, an uncertain augmented linear system is constructed as

\overset{\cdot}{X} = T (s) X + \bar{B} (l) u + N y_{r},

(8)

where

\begin{matrix} T (s) & = [\begin{matrix} A (s) & O_{n \times d} \\ C & O_{1 \times d} \\ 0 & 1 & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 1 & 0 \end{matrix}], \\ \bar{B} (l) & = [\begin{matrix} B (l) \\ O_{d \times m} \end{matrix}], N = [\begin{matrix} O_{n \times 1} \\ - 1 \\ 0 \\ ⋮ \\ 0 \end{matrix}] . \end{matrix}

Here, $O_{i \times j} \in R^{i \times j}$ represents a zero matrix. Equation (8) can be regarded as a linear nonhomogeneous uncertain system with a nonhomogeneous term, $N y_{r}$ . The considered robust tracking problem could be converted to the following robust stabilization problem. For every $s \in S$ and $l \in L$ , find a control input, $u = KX$ , such that the augmented homogeneous uncertain linear system

\overset{\cdot}{X} = T (s) X + \bar{B} (l) u

(9)

is asymptotically stable.

Lemma 1. If system (1) satisfies conditions (2) and (3), then the augmented uncertain system (9) is a matched uncertain linear system.

Proof. Assume that system (1) satisfies the system matrix matched condition (2) and the input matrix matched condition (3). Consequently, $A (s) - A (s_{0}) = B (l_{0}) φ (s)$ . Therefore,

\begin{matrix} T (s) - T (s_{0}) = [\begin{matrix} A (s) - A (s_{0}) & O_{n \times d} \\ O_{d \times n} & O_{d \times d} \end{matrix}] \\ = [\begin{matrix} B (l_{0}) \\ O_{d \times m} \end{matrix}] [\begin{matrix} φ (s) & O_{m \times d} \end{matrix}] \\ = \bar{B} (l_{0}) Φ (s), \end{matrix}

where $\bar{B} (l_{0}) = [\begin{matrix} B (l_{0}) \\ O_{d \times m} \end{matrix}] \in R^{(n + d) \times m}$ , and $Φ (s) = [\begin{matrix} φ (s) O_{m \times d} \end{matrix}] \in R^{m \times (n + d)}$ . This implies that the augmented system (3) satisfies the system matrix-matched condition. Moreover,

\begin{matrix} \bar{B} (l) - \bar{B} (l_{0}) = [\begin{matrix} B (l) - B (l_{0}) \\ O_{d \times m} \end{matrix}] \\ = [\begin{matrix} B (l_{0}) \bar{φ} (l) \\ O_{d \times m} \end{matrix}] = \bar{B} (l_{0}) \bar{φ} (l), \end{matrix}

which implies that the augmented system (9) satisfies the input matrix-matched condition. Hence, the proof is complete.

Lemma 2. Denote the new state as $q = {[\begin{matrix} q_{1} & q_{2} & \dots & q_{d} \end{matrix}]}^{T}$ . For all $s \in S, l \in L$ , supposing that the state feedback controller $u = KX = [\begin{matrix} \bar{K} x & K_{1} q \end{matrix}]$ can stabilize the system $\overset{\cdot}{X} = T (s) X + \bar{B} (l) u$ , then $u = \bar{K} x$ will stabilize system (1), and $y \to y_{r}$ as $t \to \infty$ .

Proof. Because (9) is an augmented system corresponding to system (1), $u = \bar{K} x$ will stabilize system (1). By determining the d-order differential on both sides in the augmented system (8), it yields

\begin{matrix} X^{(d + 1)} = T (s) X^{(d)} + \bar{B} (l) K X^{(d)} \\ = (T (s) + \bar{B} (l) K) X^{(d)} . \end{matrix}

(10)

Considering that $u = KX$ can stabilize system (9), the matrix $T (s) + \bar{B} (l) K$ is Hurwitz stable. It follows from (10) that $X^{(d)} \to 0$ as $t \to \infty$ . This implies that $q_{d}^{(d)} = e \to 0$ as $t \to \infty$ . Therefore, we have $y \to y_{r}$ as $t \to \infty$ . Hence, the proof is complete.

Lemma 2 shows that the robust tracking problem of system (1) can be transformed to a robust stabilization problem of the augmented homogeneous system (9).

Remark 1. Tan et al.¹² considered the robust tracking of an uncertain linear system without input uncertainty. In this study, we considered a system with uncertainties entering the input matrix, which is an extension of the existing study outcomes.

Here, the system is divided into matched and mismatched cases, and the robust tracking problem of the system is discussed.

Matched uncertain linear systems

In this section, PI algorithms are developed to calculate robust tracking control law for linear systems with matched uncertainties. The problem is transferred into stabilizing an augmented uncertain system which contains the original system states and tracking signal. Based on solving an optimal control problem with augmented nominal system and predefined performance index, the PI algorithms are proposed to obtain robust tracking feedback control.

Robust Stabilization of an augmented linear system with uncertainty

Here, we discuss system (1) when it satisfies the matched conditions. Furthermore, robust stabilization for the augmented uncertain systems is transformed into calculating an optimal control with nominal system and predefined performance index. The optimal control problem is solved using the PI method, and the required robust tracking control law is obtained.

For the nominal linear system

\overset{\cdot}{X} = T (s_{0}) X + \bar{B} (l_{0}) u,

(11)

we construct an optimal-control problem. Acquire a controller, $u = KX$ , that minimizes the following performance index

V (X (t_{0}), u (.)) = \int_{t_{0}}^{\infty} [X^{T} QX + u^{T} u] dt,

(12)

where $Q = \bar{M} + I \geq 0$ and $\bar{M}$ is the supremum of uncertainty $Φ^{T} (s) Φ (s)$ , and I is an identity matrix with proper dimensions. Actually, $\bar{M} = [\begin{matrix} M & O \\ O & O \end{matrix}]$ , which based on the following derivation

\begin{matrix} Φ^{T} (s) Φ (s) = {[\begin{matrix} φ (s) & O_{m \times d} \end{matrix}]}^{T} [\begin{matrix} φ (s) & O_{m \times d} \end{matrix}] \\ = [\begin{matrix} φ^{T} (s) φ (s) & O \\ O & O \end{matrix}] \\ \leq [\begin{matrix} M & O \\ O & O \end{matrix}] \\ = \bar{M} \end{matrix}

Here, O denotes a zero matrix of an appropriate dimensions.

According to the optimal control theory,²⁶ $u = - {\bar{B}}^{T} (l_{0}) PX$ is the solution of the optimal control problem (11) and (12), which the positive matrix P satisfies the following ARE:

T^{T} (s_{0}) P + PT (s_{0}) + Q - P \bar{B} (l_{0}) {\bar{B}}^{T} (l_{0}) P = 0 .

(13)

Theorem 1. Denote $K = - {\bar{B}}^{T} (l_{0}) P$ . If system (1) is a matched uncertain system with conditions (2) and (3), then the solution $u = KX$ , in optimal control problem (11) and (12) can stabilize the augmented uncertain system (9). That is, for every $s \in S, l \in L$ , the closed-loop system,

\overset{\cdot}{X} = [T (s) + \bar{B} (l) K] X

(14)

is asymptotically stable.

Proof. It follows from Lemma 1 that the augmented system (9) satisfies the following matching conditions:

T (s) - T (s_{0}) = \bar{B} (l_{0}) Φ (s)

(15)

and

\bar{B} (l) - \bar{B} (l_{0}) = \bar{B} (l_{0}) \bar{φ} (l)

(16)

Choosing the Lyapunov function as $V (X) = X^{T} PX$ and taking its time derivative, along the closed-loop system (14), we obtain

\begin{matrix} \frac{dV}{dt} = {\overset{\cdot}{X}}^{T} PX + X^{T} P \overset{\cdot}{X} \\ = X^{T} [T (s) + \bar{B} (l) K]^{T} PX + X^{T} P [T (s) + \bar{B} (l) K] X . \end{matrix}

By matched conditions (15) and (16),

\begin{matrix} \frac{dV}{dt} = X^{T} [(T^{T} (s_{0}) P + PT (s_{0}) + Φ^{T} (s) {\bar{B}}^{T} (l_{0}) P \\ + K^{T} {\bar{B}}^{T} (l_{0}) P + K^{T} {\bar{φ}}^{T} (l) {\bar{B}}^{T} (l_{0}) P \\ + P \bar{B} (l_{0}) Φ (s) + P \bar{B} (l_{0}) K + P \bar{B} (l_{0}) \bar{φ} (l) K] X . \end{matrix}

It follows from the optimal control gain, $K = - {\bar{B}}^{T} (l_{0}) P$ , and Riccati equation (13), that

\begin{matrix} \frac{dV}{dt} = - X^{T} \bar{M} X - X^{T} X - X^{T} K^{T} KX \\ - 2 X^{T} K^{T} Φ (s) X - 2 X^{T} K^{T} \bar{φ} (l) KX \\ = - X^{T} \bar{M} X - X^{T} X - X^{T} K^{T} KX \\ - 2 X^{T} K^{T} Φ (s) X - 2 X^{T} K^{T} \bar{φ} (l) KX \\ - X^{T} Φ^{T} (s) Φ (s) X + X^{T} Φ^{T} (s) Φ (s) X \\ = - X^{T} (\bar{M} - Φ^{T} (s) Φ (s)) X - 2 X^{T} K^{T} \bar{φ} (l) KX \\ - X^{T} (K + Φ (s))^{T} (K + Φ (s)) X - X^{T} X . \end{matrix}

We immediately conclude that $\frac{dV}{dt} < 0$ , because $Φ^{T} (s) Φ (s) \leq \bar{M}$ and $\bar{φ} (l) \geq 0$ . Therefore, for all $s \in S, l \in L$ , closed-loop system (14) is asymptotically stable. Hence, the proof is complete.

PI algorithm and its convergence

This subsection has two parts. First, based on the performance function (12), an offline policy iteration algorithm is proposed to solve the robust tracking problem. Then, an integral reinforcement formula is developed based on the Bellman equation. An online real-time PI algorithm is developed to solve the robust tracking problem with unknown nominal system matrix.

For any initial time, t, rewrite the performance function (12) as

X^{T} PX = \int_{t}^{\infty} [X^{T} QX + u^{T} u] dt

(17)

Differentiating (17) with the trajectories of system (11) yields

H^{T} P + PH + Q + K^{T} K = 0,

(18)

where $H = T (s_{0}) + \bar{B} (l_{0}) K$ .

Using (18) and $K = - {\bar{B}}^{T} (l_{0}) P$ , an off-line PI algorithm for robust tracking problem is obtained.

Algorithm 1. Offline PI algorithm for robust tracking of matched uncertain linear systems

Initialization: Select an initial stabilization control gain, $K_{0}$ .

Policy evaluation: Solve $P_{i}$ in the equation, for a given control gain $K_{i}$ ,

H_{i}^{T} P_{i} + P_{i} H_{i} + Q + K_{i}^{T} K_{i} = 0,

(19)

where $H_{i} = T (s_{0}) + \bar{B} (l_{0}) K_{i}$ .

Policy improvement: Compute $K_{i + 1}$ using

K_{i + 1} = - {\bar{B}}^{T} (l_{0}) P_{i} .

(20)

By alternative iterating (19) and (20), Algorithm 1 can be used to calculate the robust tracking law of uncertain systems, which is an extension of Kleinman’s algorithm.²⁸ The convergence proof of Algorithm 1 is identical to that of Kleinman’s algorithm.

In Algorithm 1, we can solve ARE (13) by iteratively computing (19) and (20). However, the implementation of the algorithm needs to know the information of the nominal system, and the process of calculating the controller can only be realized offline. Here, an online PI algorithm is developed to solve ARE (13) with an unknown nominal system matrix, $T (s_{0})$ .

For any initial time, t, the optimal cost in (12) can be rewritten as

\begin{matrix} V (X (t)) = \int_{t}^{\infty} [X^{T} QX + u^{T} u] dt \\ = \int_{t}^{t + Δ t} [X^{T} QX + u^{T} u] dt \\ + \int_{t + Δ t}^{\infty} [X^{T} QX + u^{T} u] dt, \end{matrix}

that is,

V (X (t)) = \int_{t}^{t + Δ t} [X^{T} QX + u^{T} u] dt + V (X (t + Δ t))

So we can have

\begin{matrix} X^{T} (t) PX (t) = \int_{t}^{t + Δ t} [X^{T} QX + u^{T} u] dt \\ + X^{T} (t + Δ t) PX (t + Δ t) . \end{matrix}

(21)

In equation (21), only the matrix P is unknown. we can calculate the elements of the matrix P through the system trajectory data. Consequently, an online PI algorithm sloving robust tracking problem for uncertain linear system is obtained.

Algorithm 2. The online PI algorithm for robust tracking of matched uncertain linear systems

Initialization: Select an initial stabilization control law, $K_{0}$ .

Policy evaluation: Compute $P_{i}$ from

\begin{matrix} X^{T} (t) P_{i} X (t) = \int_{t}^{t + Δ t} [X^{T} QX + X^{T} K_{i}^{T} K_{i} X] dt \\ + X^{T} (t + Δ t) P_{i} X (t + Δ t) . \end{matrix}

(22)

Policy improvement: Compute $K_{i + 1}$

K_{i + 1} = - {\bar{B}}^{T} (l_{0}) P_{i} .

(23)

Remark 2. Algorithm 2 is an online PI algorithm based on the RL. Through the online trajectory data of system (11), the matrix $P_{i}$ can be solved using (22) and the least-squares method. Through iterative calculation and using (22) and (23), the robust control gain, K, for augmented uncertain linear system (9) is obtained. Moreover, according to Lemma 2, the system robust tracking law is expressed as $u = \bar{K} x$ by decomposing the robust control law into $KX = [\begin{matrix} \bar{K} x & K_{1} q \end{matrix}]$ . As an advantage, Algorithm 2 need not know the nominal system matrix and can effectively avoid dimension disaster. In,¹³ by the use of online PI, the linear quadratic regulator with unknown system matrix is calculated. We developed this algorithm to solve the robust tracking problem for an uncertain linear system.

The convergence in Algorithm 2 is proven as follows:

Theorem 2. We assume that $(T (s_{0}) + \bar{B} (l_{0}) K_{i})$ is stable, and we solve $P_{i}$ using equation (22), which equals to obtaining a solution for the following equation:

\begin{matrix} [T (s_{0}) + \bar{B} (l_{0}) K_{i}]^{T} P_{i} + P_{i} [T (s_{0}) + \bar{B} (l_{0}) K_{i}] \\ + Q + K_{i}^{T} K_{i} = 0 \end{matrix}

(24)

Proof. Dividing by $Δ t$ on both sides of (22) and taking a limit, we attain

\begin{matrix} 0 = lim_{Δ t \to 0} \frac{X^{T} (t + Δ t) P_{i} X (t + Δ t) - X^{T} (t) P_{i} X (t)}{Δ t} \\ + lim_{Δ t \to 0} \frac{\int_{t}^{t + Δ t} X^{T} (Q + K_{i}^{T} K_{i}) Xdt}{Δ t} \\ = \frac{d X^{T} (t) P_{i} X (t)}{dt} + lim_{Δ t \to 0} \frac{d}{d Δ t} \int_{t}^{t + Δ t} \\ X^{T} (Q + K_{i}^{T} K_{i}) Xdt \\ = X^{T} [T (s_{0}) + \bar{B} (l_{0}) K_{i}]^{T} P_{i} X \\ + X^{T} P_{i} [T (s_{0}) + \bar{B} (l_{0}) K_{i}] X \\ + X^{T} (Q + K_{i}^{T} K_{i}) X . \end{matrix}

Thus, (22) implies (24).

Furthermore, take into account the asymptotically stable systems, $\overset{\cdot}{X} = (T (s_{0}) - \bar{B} (l_{0}) K_{i}) X$ . Selecting the Lyapunov function as $V_{i} (X) = X^{T} P_{i} X$ , and taking its time derivative along the yields

\begin{matrix} \frac{d}{dt} (X^{T} P_{i} X) = X^{T} [(T (s_{0}) + \bar{B} (l_{0}) K_{i})^{T} P_{i} \\ + P_{i} (T (s_{0}) + \bar{B} (l_{0}) K_{i})] X . \end{matrix}

Calculating definite integral from t to $t + Δ t$ leads to

\begin{matrix} \int_{t}^{t + Δ t} X^{T} (Q + K_{i}^{T} K_{i}) Xd τ = X^{T} (t + Δ t) P_{i} X (t + Δ t) \\ - X^{T} (t) P_{i} X (t), \end{matrix}

which is (22). Hence, the proof is complete.

Therefore, the integral reinforcement relation (22) is equivalent to equation (19) in Algorithm 1, which is an extension of Kleinman’s algorithm. This indicates the convergence of Algorithm 2.

Mismatched uncertain linear system

The robust tracking problem of a mismatched uncertain linear system (1) is discussed in this section. When the matched conditions (2) and (3) are not satisfied in system, we consider the design of robust tracking control law. The robust tracking problem is transformed into designing a robust control law for an augmented uncertain linear system. By calculating iteratively the optimal control law for an extended nominal system with a properly defined performance index, the online PI method is used to obtain robust tracking control law.

Obviously, the augmented system (9) is a mismatched uncertain linear system when the uncertain system (1) does not match conditions (2) and (3). In addition, the uncertainty of the augmented uncertain linear system can be decomposed into a matched part and a mismatched part according to (4) and (5):

\begin{matrix} T (s) - T (s_{0}) = [\begin{matrix} A (s) - A (s_{0}) & O \\ O & O \end{matrix}] \\ = [\begin{matrix} B (l_{0}) B {(l_{0})}^{+} [A (s) - A (s_{0})] & O \\ O & O \end{matrix}] \\ + [\begin{matrix} [I - B (l_{0}) B {(l_{0})}^{+}] [A (s) - A (s_{0})] & O \\ O & O \end{matrix}], \\ = [\begin{matrix} B (l_{0}) B {(l_{0})}^{+} & O \\ O & O \end{matrix}] [\begin{matrix} A (s) - A (s_{0}) & O \\ O & O \end{matrix}] \\ + [\begin{matrix} I - B (l_{0}) B {(l_{0})}^{+} & O \\ O & O \end{matrix}] [\begin{matrix} A (s) - A (s_{0}) & O \\ O & O \end{matrix}] \\ = \bar{B} (l_{0}) \bar{B} (l_{0})^{+} [T (s) - T (s_{0})] \\ + [\bar{I} - \bar{B} (l_{0}) \bar{B} (l_{0})^{+}] [T (s) - T (s_{0})] \end{matrix}

and

\begin{matrix} \bar{B} (l) - \bar{B} (l_{0}) = \bar{B} (l_{0}) \bar{B} (l_{0})^{+} [\bar{B} (l) - \bar{B} (l_{0})] \\ + [I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+}] [\bar{B} (l) - \bar{B} (l_{0})], \end{matrix}

where $\bar{B} (l_{0}) = [\begin{matrix} B (l_{0}) \\ O \end{matrix}]$ , $\bar{B} (l_{0})^{+} = [\begin{matrix} B {(l_{0})}^{+} & O \end{matrix}]$ , $\bar{I} = [\begin{matrix} I & O \\ O & O \end{matrix}]$ , and O is a zero matrix of appropriate dimensions.

Lemma 3. If Assumptions 3 and 4 hold, then the uncertainty of the augmented uncertain linear system (9) satisfies the bounded conditions as follows:

[T (s) - T (s_{0})]^{T} [\bar{B} (l_{0})^{+}]^{T} \bar{B} (l_{0})^{+} [T (s) - T (s_{0})] \leq \bar{F}

(25)

[T (s) - T (s_{0})]^{T} [T (s) - T (s_{0})] \leq \bar{H},

(26)

where $\bar{F} = [\begin{matrix} F & O \\ O & O \end{matrix}]$ , $\bar{H} = [\begin{matrix} H & O \\ O & O \end{matrix}]$ .

Proof. Denote $\tilde{A} = A (s) - A (s_{0})$ . From Assumption 3, we can obtain

\begin{matrix} [T (s) - T (s_{0})]^{T} [\bar{B} (l_{0})^{+}]^{T} \bar{B} (l_{0})^{+} [T (s) - T (s_{0})] \\ = [\begin{matrix} {\tilde{A}}^{T} & O \\ O & O \end{matrix}] [\begin{matrix} {[B {(l_{0})}^{+}]}^{T} \\ O \end{matrix}] \\ \times [\begin{matrix} B {(l_{0})}^{+} & O \end{matrix}] [\begin{matrix} \tilde{A} & O \\ O & O \end{matrix}] \\ = [\begin{matrix} {\tilde{A}}^{T} {[B {(l_{0})}^{+}]}^{T} B {(l_{0})}^{+} \tilde{A} & O \\ O & O \end{matrix}] \\ \leq [\begin{matrix} F & O \\ O & O \end{matrix}] \\ = \bar{F} . \end{matrix}

Therefore, (25) holds. In addition, we can achieve

\begin{matrix} [T (s) - T (s_{0})]^{T} [T (s) - T (s_{0})] = [\begin{matrix} {\tilde{A}}^{T} & O \\ O & O \end{matrix}] [\begin{matrix} \tilde{A} & O \\ O & O \end{matrix}] \\ = [\begin{matrix} {\tilde{A}}^{T} \tilde{A} & O \\ O & O \end{matrix}] \\ \leq [\begin{matrix} H & O \\ O & O \end{matrix}] \\ = \bar{H} . \end{matrix}

Therefore, (26) holds, thus completing the proof.

We construct the following optimal control problem. For an extended nominal linear system,

\overset{\cdot}{X} = T (s_{0}) X + \bar{B} (l_{0}) u + (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+}) v,

(27)

find an augmented state feedback law, $u = KX, v = LX$ , in order to minimize the performance index

\int_{0}^{\infty} [X^{T} (\bar{F} + \bar{H} + β^{2} I) X + u^{T} u + v^{T} v] dt,

(28)

where $β \geq 0$ is a design parameter.

We denote $\tilde{B} = [\begin{matrix} \bar{B} (l_{0}) & (I - \bar{B} (l_{0}) \bar{B} {(l_{0})}^{+}) \end{matrix}]$ and $\bar{u} = [\begin{matrix} u \\ v \end{matrix}]$ . Then rewriting (27) yields

\overset{\cdot}{X} = T (s_{0}) X + \tilde{B} \bar{u},

and (28) can be rewritten as

\int_{0}^{\infty} [X^{T} (\bar{F} + \bar{H} + β^{2} I) X + {\bar{u}}^{T} \bar{u}] dt .

By computing the optimal control law, it follows that

[\begin{matrix} u \\ v \end{matrix}] = - {\tilde{B}}^{T} PX = [\begin{matrix} - {\bar{B}}^{T} (l_{0}) P \\ - {(I - \bar{B} (l_{0}) \bar{B} {(l_{0})}^{+})}^{T} P \end{matrix}] X,

(29)

where P is a positive definite matrix solving the following ARE

PT (s_{0}) + T^{T} (s_{0}) P + \bar{F} + \bar{H} + β^{2} I - P \tilde{B} {\tilde{B}}^{T} P = 0

(30)

The following theorem illustrates that optimal control law (29) can stabilize the mismatched uncertain linear system (9).

Theorem 3. We assume that $u = KX$ and $v = LX$ are the solutions to the optimal control problems (27) and (28). If parameter $β$ can be chosen to hold the following conditions:

\begin{matrix} β^{2} I - 2 L^{T} L > 0, [\bar{B} (l) - \bar{B} (l_{0})] {\bar{B}}^{T} (l_{0}) \geq 0, \end{matrix}

(31)

then the optimal control law, $u = KX$ , with $K = - {\bar{B}}^{T} (l_{0}) P$ can stabilize the mismatched augmented uncertain system (9). That is, for every $s \in S, l \in L$ , the mismatched closed-loop system

\overset{\cdot}{X} = [T (s) + \bar{B} (l) K] X

(32)

is asymptotically stable.

Proof. Choosing the Lyapunov function as $V (X) = X^{T} PX$ and taking its time derivative, along the closed-loop system (32), we obtain

\begin{matrix} \frac{dV}{dt} = X^{T} [(T^{T} (s) P + PT (s) + 2 P \bar{B} (l) K) X \\ = X^{T} [(T (s) - T (s_{0}))^{T} P + P (T (s) - T (s_{0}))] X \\ + X^{T} [(T^{T} (s_{0}) P + PT (s_{0})] X \\ + 2 X^{T} P \bar{B} (l_{0}) K + 2 P (\bar{B} (l) - \bar{B} (l_{0})) K] X . \end{matrix}

(33)

By decomposing the matched and mismatched parts in the augmented uncertain system (9), we will have

\begin{matrix} \frac{dV}{dt} = X^{T} [T^{T} (s_{0}) P + PT (s_{0})] X \\ + X^{T} [T (s) - T (s_{0})]^{T} {\bar{B}}^{T} (l_{0})^{+} {\bar{B}}^{T} (l_{0}) PX \\ + X^{T} [T (s) - T (s_{0})]^{T} (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+})^{T} P] X \\ + X^{T} P \bar{B} (l_{0}) \bar{B} (l_{0})^{+} (T (s) - T (s_{0})) X \\ + X^{T} P (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+}) (T (s) - T (s_{0})) X \\ + X^{T} [2 P \bar{B} (l_{0}) K + 2 P (\bar{B} (l) - \bar{B} (l_{0})) K] X . \end{matrix}

(34)

According to the ARE (30)

\begin{matrix} \frac{dV}{dt} = - X^{T} (\bar{F} + \bar{H} + β^{2} I) X + X^{T} P [\bar{B} (l_{0}) {\bar{B}}^{T} (l_{0}) \\ + (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+}) (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+})^{T}] PX \\ + X^{T} (T (s) - T (s_{0}))^{T} ({\bar{B}}^{T} (l_{0}))^{+} {\bar{B}}^{T} (l_{0}) PX \\ + X^{T} (T (s) - T (s_{0}))^{T} (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+})^{T} PX \\ + X^{T} P \bar{B} (l_{0}) \bar{B} (l_{0})^{+} (T (s) - T (s_{0})) X \\ + X^{T} P (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+}) (T (s) - T (s_{0})) X \\ + 2 X^{T} P \bar{B} (l_{0}) KX + 2 x^{T} P (\bar{B} (l) - \bar{B} (l_{0})) KX . \end{matrix}

By $K = - {\bar{B}}^{T} (l_{0}) P$ , we will have

\begin{matrix} \frac{dV}{dt} = - X^{T} (\bar{F} + \bar{H} + β^{2} I) X + X^{T} K^{T} KX \\ + X^{T} P (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+}) (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+})^{T} PX \\ - X^{T} (T (s) - T (s_{0}))^{T} ({\bar{B}}^{T} (l_{0}))^{+} KX \\ + X^{T} (T (s) - T (s_{0}))^{T} (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+})^{T} PX \\ - X^{T} K^{T} \bar{B} (l_{0})^{+} (T (s) - T (s_{0})) X \\ + X^{T} P (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+}) (T (s) - T (s_{0})) X \\ - 2 X^{T} K^{T} KX - 2 X^{T} P (\bar{B} (l) - \bar{B} (l_{0})) {\bar{B}}^{T} (l_{0}) PX . \end{matrix}

By $L = - (I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+})^{T} P$ , it follows that

\begin{matrix} \frac{dV}{dt} = - X^{T} (\bar{F} + \bar{H} + β^{2} I) X - X^{T} K^{T} KX + X^{T} L^{T} LX \\ - 2 X^{T} K^{T} \bar{B} (l_{0})^{+} (T (s) - T (s_{0})) X \\ - 2 X^{T} (T (s) - T (s_{0}))^{T} LX \\ - 2 X^{T} P (\bar{B} (l) - \bar{B} (l_{0})) {\bar{B}}^{T} (l_{0}) PX . \end{matrix}

Because

\begin{matrix} - X^{T} K^{T} KX - 2 X^{T} K^{T} \bar{B} (l_{0})^{+} (T (s) - T (s_{0})) X \\ = - X^{T} [K - \bar{B} (l_{0})^{+} (T (s) - T (s_{0}))]^{T} \\ \times [K - \bar{B} (l_{0})^{+} (T (s) - T (s_{0}))] X \\ + X^{T} [\bar{B} (l_{0})^{+} (T (s) - T (s_{0}))]^{T} \\ \times [\bar{B} (l_{0})^{+} (T (s) - T (s_{0}))] X \\ \leq X^{T} [\bar{B} (l_{0})^{+} (T (s) - T (s_{0}))]^{T} \\ \times [\bar{B} (l_{0})^{+} (T (s) - T (s_{0}))] X, \end{matrix}

and

\begin{matrix} - 2 X^{T} L^{T} (T (s) - T (s_{0})) X \\ \leq X^{T} (T (s) - T (s_{0}))^{T} (T (s) - T (s_{0})) X + X^{T} L^{T} LX \\ \leq X^{T} L^{T} LX + X^{T} HX, \end{matrix}

hence

\begin{matrix} \frac{dV}{dt} = - X^{T} (\bar{F} + \bar{H} + β^{2} I) X + X^{T} L^{T} LX \\ - X^{T} K^{T} KX - 2 X^{T} K^{T} \bar{B} (l_{0})^{+} (T (s) - T (s_{0})) X \\ - 2 X^{T} (T (s) - T (s_{0}))^{T} LX \\ - 2 X^{T} P (\bar{B} (l) - \bar{B} (l_{0})) {\bar{B}}^{T} (l_{0}) PX \\ \leq - X^{T} (\bar{F} + \bar{H} + β^{2} I) X + 2 X^{T} L^{T} LX + X^{T} \bar{H} X \\ + X^{T} [\bar{B} (l_{0})^{+} (T (s) - T (s_{0}))]^{T} \\ \times [\bar{B} (l_{0})^{+} (T (s) - T (s_{0}))] X \\ - 2 X^{T} P (\bar{B} (l) - \bar{B} (l_{0})) {\bar{B}}^{T} (l_{0}) PX \\ \leq - 2 X^{T} P (\bar{B} (l) - \bar{B} (l_{0})) {\bar{B}}^{T} (l_{0}) PX \\ - X^{T} (β^{2} I - 2 L^{T} L) X . \end{matrix}

According to the conditions in (31)

\frac{dV}{dt} < 0 .

Therefore, the mismatched uncertain linear system (9) is asymptotically stable for every $s \in S$ and $l \in L$ ³¹. That is, $u = - {\bar{B}}^{T} (l_{0}) PX$ is a robust controller for the augmented mismatched uncertain system (9). Hence, the proof is complete.

Remark 3. Generally, the pseudo-inverse of matrix, $\bar{B} (l_{0})^{+}$ , will exist if its column vectors are not linearly dependent.²⁹ In practical application, the input matrix, $\bar{B} (l_{0})$ , is usually a matrix with column full-rank. Therefore, the pseudo-inverse of the input matrix $\bar{B} (l_{0})$ is generally satisfied. Furthermore, the pseudo-inverse $\bar{B} (l_{0})^{+}$ satisfies $\bar{B} (l_{0})^{+} \bar{B} (l_{0}) = I$ . However, it does not satisfy $\bar{B} (l_{0}) \bar{B} (l_{0})^{+} = I$ .

Here, we propose a robust tracking PI algorithm based on the RL for mismatched linear systems.

For any initial time, t, the optimal performance function (28) can be written as

\begin{matrix} V (X (t)) = \int_{t}^{\infty} [X^{T} \bar{Q} X + {\bar{u}}^{T} \bar{u}] dt \\ = \int_{t}^{t + Δ t} [X^{T} \bar{Q} X + {\bar{u}}^{T} \bar{u}] dt \\ + \int_{t + Δ t}^{\infty} [X^{T} \bar{Q} X + {\bar{u}}^{T} \bar{u}] dt, \end{matrix}

where $\bar{Q} = \bar{F} + \bar{H} + β^{2} I$ .

Similar to Algorithm 2, the following online PI algorithm can be utilized to calculate the robust tracking law of a mismatched uncertain linear system.

Algorithm 3. The online PI algorithm for robust tracking of mismatched uncertain linear systems

Initialization: Select an initial stabilization control gain, ${\bar{K}}_{0}$ .

Policy evaluation: Compute $P_{i}$ from

\begin{matrix} X^{T} (t) P_{i} X (t) = \int_{t}^{t + Δ t} [X^{T} \bar{Q} x + X^{T} {\bar{K}}_{i}^{T} {\bar{K}}_{i} X] dt \\ + X^{T} (t + Δ t) P_{i} X (t + Δ t) . \end{matrix}

(35)

Policy improvement: Compute ${\bar{K}}_{i + 1}$

{\bar{K}}_{i + 1} = - {\tilde{B}}^{T} P_{i} .

(36)

The convergence of the algorithm is proven similar to that of the matched system, which is omitted here.

Remark 4. Robust tracking design is a challenging problem,^30,33 particularly when the signals to track is unstable. We consider the tracking problem of a class of polynomial signals that may be unstable. Algorithm 3 is an online robust control method, which can be available for mismatched uncertain systems with unknowing nominal state matrix. Using the least squares method, the matrix $P_{i}$ can be solved using (35) through the online trajectory data of system (27). We used the PI algorithm in the RL to solve this problem, which is a novel approach. In Theorem 3, it is clear that if parameter $β$ is selected to be large relatively, then the conditions in (31) easily matches in many practical applications.

Numerical experiments

The results of two numerical experiments are demonstrated to examine the viability of the theoretical frameworks. Moreover, the proposed PI algorithms are applied to solve the robust tracking of matched and mismatched uncertain systems effectively.

Example 1. A matched uncertain linear system is considered as follows:

\begin{matrix} \overset{\cdot}{x} = [\begin{matrix} 0 & 2 \\ 2 + s & s \end{matrix}] x + [\begin{matrix} 0 \\ l \end{matrix}] u, \\ y = [\begin{matrix} 1 & 0 \end{matrix}] x, \end{matrix}

(37)

where $s \in [0, 3]$ and $l \in [1, 3]$ are uncertain parameters. The reference signal is assumed to be $y_{r} = t + 5$ . The control objectives of system is to design a state feedback law, $u = Kx$ , which can make the system output $y = Cx$ asymptotically track the reference signal for every $s \in [0, 3]$ and $l \in [1, 3]$ .

We denote $A (s) = [\begin{matrix} 0 & 2 \\ 2 + s & s \end{matrix}], B (l) = [\begin{matrix} 0 \\ l \end{matrix}]$ , and $C = [1, 0]$ , respectively. Let $s_{0} = 0$ and $l_{0} = 1$ . According to (8), we will have

\begin{matrix} T (s) & = [\begin{matrix} 0 & 2 & 0 & 0 \\ 2 + s & s & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{matrix}], \bar{B} (l) = [\begin{matrix} 0 \\ l \\ 0 \\ 0 \end{matrix}], \\ N & = [\begin{matrix} 0 \\ 0 \\ - 1 \\ 0 \end{matrix}] . \end{matrix}

\begin{matrix} T (s) - T (s_{0}) = [\begin{matrix} 0 & 0 & 0 & 0 \\ s & s & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}] \\ = [\begin{matrix} 0 \\ 1 \\ 0 \\ 0 \end{matrix}] [\begin{matrix} s & s & 0 & 0 \end{matrix}] \equiv \bar{B} (l_{0}) Φ (s), \end{matrix}

(38)

and

\begin{matrix} \bar{B} (l) - \bar{B} (l_{0}) = [\begin{matrix} 0 \\ 1 \\ 0 \\ 0 \end{matrix}] (l - 1) \equiv \bar{B} (l_{0}) \bar{φ} (l), \end{matrix}

(39)

which implies that the parameters perturbation in system (37) are matched. Obviously, $\bar{φ} (l) = (l - 1) \geq 0$ for any $l \in [1, 3]$ .

According to (38), we will have

\begin{matrix} Φ^{T} (s) Φ (s) = [\begin{matrix} s^{2} & s^{2} & 0 & 0 \\ s^{2} & s^{2} & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}] \leq [\begin{matrix} 9 & 9 & 0 & 0 \\ 9 & 9 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}] \\ = \bar{M} . \end{matrix}

(40)

Therefore, the robust tracking problem is finally transformed into solving an optimal control of the nominal system. For the augmented nominal system,

\overset{\cdot}{X} = [\begin{matrix} 0 & 2 & 0 & 0 \\ 2 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{matrix}] X + [\begin{matrix} 0 \\ 1 \\ 0 \\ 0 \end{matrix}] u,

design a state feedback law, $u = KX$ , in order to minimize the performance index

V (X (t_{0}), u (.)) = \int_{t_{0}}^{\infty} [X^{T} QX + u^{T} u] dt,

(41)

where

Q = \bar{M} + I = [\begin{matrix} 10 & 9 & 0 & 0 \\ 9 & 10 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 . \end{matrix}]

(42)

By applying Algorithm 2, a robust tracking-control gain is obtained for system (37). We chose the initial control gain as $[- 14.5 - 8 - 20 - 6]$ . The initial condition of the augmented nominal system was selected as $X_{0} = [0.2 0 0.15 0]^{T}$ . By using MATLAB, after four iterations, the positive definite matrix P and the control gain converge to the optimal solutions as follows:

P = [\begin{matrix} 13.4446 & 8.0880 & 11.4490 & 3.2698 \\ 8.0880 & 6.5092 & 3.6855 & 1.0071 \\ 11.4490 & 3.6855 & 18.9043 & 6.0825 \\ 3.2698 & 1.0071 & 6.0825 & 3.6346 \end{matrix}],

and

K = [\begin{matrix} - 8.0880 & - 6.5092 & - 3.6855 & - 1.0071 \end{matrix}] .

We used MATLAB to solve the ARE directly, which is the implementation of Algorithm 1. Consequently, the following P matrix is obtained.

P = [\begin{matrix} 13.3014 & 8.0402 & 11.2417 & 3.2466 \\ 8.0402 & 6.4931 & 3.6167 & 1.0000 \\ 11.2417 & 3.6167 & 18.5987 & 6.0402 \\ 3.2466 & 1.0000 & 6.0402 & 3.6167 \end{matrix}] .

By comparison, the online PI algorithm has the same effect as the existing offline methods. But the online PI method does not need to know the nominal system matrix, which is the advantage of algorithm 2. It clearly shows that PI algorithm is effective in solving robust tracking problem. Figure 1 shows the evolution of the tracking-control signal. The convergence process of the elements in the P matrix is shown in Figure 2. It shows that optimal cost is obtained at time $t = 4 s$ after four updates of the controller parameters. The evolution of system output tracking reference signal with different parameters, s and l, are presented in Figure 3, which show that robust tracking is achieved for uncertain system (37).

Example 2. A mismatched uncertain system is considered as follows:

\begin{matrix} \overset{\cdot}{x} = [\begin{matrix} s & s - 1 \\ 2 & - 3 \end{matrix}] x + [\begin{matrix} 0 \\ l \end{matrix}] u, \\ y = [\begin{matrix} 1 & 0 \end{matrix}] x, \end{matrix}

(43)

where $s \in [- 2, 2]$ and $l \in [1, 3]$ are uncertain parameters. The reference signal is assumed to be $y_{r} = 3$ . The control objectives of system is to design a state feedback law, $u = Kx$ , which can make the system output $y = Cx$ asymptotically track the reference signal for every $s \in [- 2, 2]$ and $l \in [1, 3]$ .

Figure 1.

Control signal.

Figure 2.

P matrix iteration.

Figure 3.

Evolution of system output and reference signal with different parameters.

We denote $A (s) = [\begin{matrix} s & s - 1 \\ 2 & - 3 \end{matrix}], B (l) = [\begin{matrix} 0 \\ l \end{matrix}]$ and $C = [1, 0]$ , respectively. Let $s_{0} = 0$ and $l_{0} = 1$ . According to (8), we will have

\begin{matrix} T (s) = [\begin{matrix} s & s - 1 & 0 \\ 2 & - 3 & 0 \\ 1 & 0 & 0 \end{matrix}], \bar{B} (l) = [\begin{matrix} 0 \\ l \\ 0 \end{matrix}], N = [\begin{matrix} 0 \\ 0 \\ - 1 \end{matrix}] . \end{matrix}

T (s) - T (s_{0}) = [\begin{matrix} s & s & 0 \\ 0 & s & 0 \\ 0 & 0 & 0 \end{matrix}], B (l) - B (l_{0}) = [\begin{matrix} 0 \\ l - 1 \\ 0 \end{matrix}] .

It is clear that the system is a mismatched system.

\bar{B} (l_{0})^{+} = ({\bar{B}}^{T} (l_{0}) \bar{B} (l_{0}))^{- 1} {\bar{B}}^{T} (l_{0}) = [\begin{matrix} 0 & 1 & 0 \end{matrix}]

I - \bar{B} (l_{0}) \bar{B} (l_{0})^{+} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{matrix}]

We chose $β = 1$ . Denote $Δ T (s) = T (s) - T (s_{0})$ . Therefore, we had

\begin{matrix} Δ T {(s)}^{T} {(\bar{B} {(l_{0})}^{+})}^{T} \bar{B} {(l_{0})}^{+} Δ T (s) = [\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}] = \bar{F}, \end{matrix}

and

\begin{matrix} Δ T {(s)}^{T} Δ T (s) = [\begin{matrix} s^{2} & s^{2} & 0 \\ s^{2} & s^{2} & 0 \\ 0 & 0 & 0 \end{matrix}] \leq [\begin{matrix} 4 & 4 & 0 \\ 4 & 4 & 0 \\ 0 & 0 & 0 \end{matrix}] = \bar{H} . \end{matrix}

Therefore, the positive definite weight matrix $\bar{Q}$ is calculated as follows:

\bar{Q} = \bar{F} + \bar{H} + β^{2} I = [\begin{matrix} 5 & 4 & 0 \\ 4 & 5 & 0 \\ 0 & 0 & 1 \end{matrix}]

By applying Algorithm 3, a robust tracking-control gain is calculated for the mismatched uncertain system (43). The initial augmented control law is selected as

K = [\begin{matrix} - 2 & - 2 & 0 \\ 2 & - 1 & 0 \\ 0 & 0 & 0 \\ 1 & 1 & 8 \end{matrix}] .

The initial condition of the augmented nominal system is selected as $x_{0} = [3, - 2, - 8]^{T}$ . By using MATLAB, after four iterations, the positive definite matrix P and the augmented control gain converge to the optimal solutions as follows:

P = [\begin{matrix} 2.6301 & 0.4074 & 0.2535 \\ 0.4074 & 0.6118 & - 0.0741 \\ 0.2535 & - 0.0741 & 1.0090 \end{matrix}]

and

K = [\begin{matrix} - 0.4074 & - 0.6118 & 0.0741 \\ - 2.6301 & - 0.4074 & - 0.2535 \\ 0 & 0 & 0 \\ - 0.2535 & 0.0741 & - 1.0090 \end{matrix}]

To solve the ARE directly using MATLAB software, the following P matrix is obtained:

P = [\begin{matrix} 2.6232 & 0.4071 & 0.2373 \\ 0.4071 & 0.6076 & - 0.0730 \\ 0.2373 & - 0.0730 & 0.9687 \end{matrix}]

It clearly shows that PI algorithm is effective in solving robust tracking problem for mismatched uncertain system. Figure 4 shows the evolution of the tracking-control signal. The convergence process of the elements in the P matrix is shown in Figure 5. The evolution of the output and the reference trajectory with different parameters s and l are presented in Figure 6. As a result, a robust output tracking is achieved for mismatched uncertain system (43).

Figure 4.

Control signal.

Figure 5.

P matrix iteration.

Figure 6.

Evolution of system output and reference signal with different parameters.

Conclusion

In this study, RL-based online PI algorithms were proposed to calculate robust tracking control law for uncertain linear systems. It was based on an online policy iteration without using a nominal system matrix. The robust tracking problem was transformed into solving an optimal control with a predefined cost function. Based on the corresponding augmented linear system, offline and online PI algorithms were established to obtain a robust tracking controller. Numerical experiments were presented to demonstrate the effectiveness of the theoretical results. The proposed method may be developed to solve tracking problems of uncertain discrete-time systems, which may be the subject of our future research.

Footnotes

Acknowledgements

The authors thank the editors and reviewers for their valuable comments. The authors also would like to thank Editage () for English language editing.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Research Fund for High-level Talents of Liupanshui Normal University under grant LPSSYKYJJ202001, and the National Natural Science Found of China under grant 61463002.

ORCID iD

Dengguo Xu

References

Zhou

Doyle

Glover

, et al. Robust and optimal control, Volume 40. New Jersey: Prentice Hall, 1996.

Lewis

Syrmos

. Optimal control. Hoboken: John Wiley & Sons, 2012.

Petersen

Hollot

. A riccati equation approach to the stabilization of uncertain linear systems. Automatica 1986; 22(4): 397–411.

Schmitendorf

. Designing stabilizing controllers for uncertain systems using the riccati equation approach. IEEE Trans Automatic Control 1988; 33(4): 376–379.

Schmitendorf

. A design methodology for robust stabilizing controllers. AIAA J Guid Control Dyn 1987; 10(2): 250–254.

Jabbari

Schmitendorf

. A noniterative method for the design of linear robust controllers. IEEE Trans Automatic Control 1990; 35(8): 954–957.

Tsay

. Robust control for linear uncertain systems via linear quadratic state feedback. Syst Control Lett 1990; 15(3): 199–205.

Dolphus

Schmitendorf

. A non-iterative riccati approach to robust control design. In: American control conference, San Diego, CA, USA. 1990, pp. 916–918. IEEE.

Schmitendorf

Barmish

. Robust asymptotic tracking for linear systems with unknown parameters. Automatica 1986; 22(3): 355–360.

10.

Benson

Schmitendorf

. Augmented system approach to measurement-based robust tracking. In: Proceedings of 1994 American control conference, 29 June 1–July 1994, vol.3, pp. 2960–2964. Baltimore, MD, USA: IEEE.

11.

Shieh

Liang

Mao

. Robust output tracking control of an uncertain linear system via a modified optimal linear-quadratic method. J Optimiz Theory App 2003; 117(3): 649–659.

12.

Tan

Shu

Lin

. An optimal control approach to robust tracking of linear systems. Int J Control 2009; 82(3): 525–540.

13.

Vrabie

Pastravanu

Abu-Khalaf

, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 2009; 45(2): 477–484.

14.

Bertsekas

. Approximate policy iteration: a survey and some new methods. J Control Theory Appl 2011; 9(3): 310–335.

15.

Kiumarsi

Vamvoudakis

Modares

, et al. Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans Neural Netw Learn Syst 2018; 29(6): 2042–2062.

16.

Modares

Lewis

. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Automat Contr 2014; 59(11): 3051–3056.

17.

Qin

Zhang

Luo

. Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming. Int J Control 2014; 87(5): 1000–1009.

18.

Gao

Jiang

. Adaptive dynamic programming and adaptive optimal output regulation of linear systems. IEEE Trans Automat Contr 2016; 61(12): 4164–4169.

19.

Liu

Tong

, et al. Integral barrier Lyapunov function based adaptive control for switched nonlinear systems. Sci China Inform Sci 2020; 63(3): 132203:1–132203:14.

20.

Xie

Zhou

Yue

, et al. Relaxed control design of discrete-time takagi – sugeno fuzzy systems: an event-triggered real-time scheduling approach. IEEE Trans Syst Man Cybern Syst 2018; 48(12): 2251–2262.

21.

Wang

. Optimal guaranteed cost tracking of uncertain nonlinear systems using adaptive dynamic programming with concurrent learning. Int J Control Autom Syst 2020; 18(5):1116–1127.

22.

Liu

Tang

Tong

, et al. Reinforcement learning design- based adaptive tracking control with less learning parameters for nonlinear discrete-time mimo systems. IEEE Trans Neural Netw Learn Syst 2014; 26(1): 165–176.

23.

Wang

Liu

. Adaptive critic nonlinear robust control: a survey. IEEE Trans Cybern 2017; 47(10): 3429–3451.

24.

Wang

Liu

, et al. An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Trans Syst Man Cybern Syst 2016; 46(5): 713–717.

25.

Jiang

Zhang

Cui

, et al. Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method. Neurocomputing 2018; 273: 68–77.

26.

Yang

Zhong

. Adaptive dynamic programming for robust regulation and its application to power systems. IEEE Trans Ind Electron 2018; 65(7): 5722–5732.

27.

Lin

. Robust control design: an optimal control approach, Volume 18. Chichester: John Wiley & Sons, 2007.

28.

Kleinman

. On an iterative technique for riccati equation computations. IEEE Trans Autom Control 1968; 13(1): 114–115.

29.

Ben-Israel

Greville

TNE

. Generalized inverses: theory and applications. New York: Springer, 2003.

30.

She

, et al. Robust tracking and disturbance rejection for linear uncertain system with unknown state delay and disturbance. IEEE/ASME Trans Mechatron, 2018, 23(3): 1445–1455.

31.

Liu

, et al. Barrier Lyapunov function based adaptive fuzzy FTC for switched systems and its applications to resistance inductance capacitance circuit system. IEEE Trans Cybern 2020; 50(8): 3491–3502.

32.

Zhang

Shi

Yang

, et al. Robust adaptive control for continuous wheel slip rate tracking of vehicle with state observer. Measurement Control 2020; 53(7-8):1331-1341.

33.

Wang

Liu

, et al. Adaptive neural tracking control for uncertain switched nonlinear non-lower triangular system with disturbances and dead-zone input. Int J Control Autom Syst 2020; 18(6):1445–1452.

34.

Zhang

. Learning-based robust tracking control of quadrotor with time-varying and coupling uncertainties. IEEE Trans Neural Netw Learn Syst 2020; 31(1): 259–273.

Adaptive optimal control approach to robust tracking of uncertain linear systems based on policy iteration

Abstract

Keywords

Introduction

Robust tracking control framework

Matched uncertain linear systems

Robust Stabilization of an augmented linear system with uncertainty

PI algorithm and its convergence

Mismatched uncertain linear system

Numerical experiments

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References