Self-learning control of model uncertain active suspension systems with observer

Abstract

This paper presents a self-learning control algorithm for model uncertain suspension systems using single network adaptive critic (SNAC) approach. First, a differential neural network (DNN) observer in conjunction with the weight updating law is established to observe the uncertain dynamic. Then, the nominal optimal value function is approximated by a critic NN whose weight is updated by a novel design learning law driven by the filtered parameter error. The online self-learning control policy is thus derived by approximately solving the Hamilton–Jacobi–Bellman (HJB) equation based on SNAC technique. The Lyapunov approach is synthesized to ensure the convergent characteristics of the entire closed-loop system composed of the DNN observer and the self-learning control policy. Computer simulation of a quarter car suspension system is established to verify the effectiveness of the proposed approach. Simulation results illustrated that the designed method can ensure the good performance in terms with the road hold and ride quality. In addition, independent of model and online self-learning characteristics make it possible to design a high-performance vehicle active suspension controller.

Keywords

Observer self-learning control active suspension single network adaptive critic

Introduction

A high-performance suspension system is the key to the pursuit of a smoother and safer car.¹ Correspondingly, the suspension system is expected to have more intelligence to adapt to different road inputs. It needs to be pointed out that the suspension system is affected by unknown road input, resulting in unavoidable vibrations related to vehicle ride comfort. Therefore, when designing the controller of the active suspension system, state variable information related to the vertical dynamics of the vehicle should be provided.² However, the implementation of suspension control system based on traditional observation theory brings great challenges because unknown road input is difficult to measure or observation when the vehicle is driving. Moreover, the uncertainty of parameters, such as sprung mass, also brings difficulties to the design of suspension observer and controller. Enhancing active vibration control performances as shown in refs. 3 and 4 is the main target of suspension control.

To deal with model uncertainty, robust approach,⁵ LMI based approach,⁶ sliding mode approach,^7,8,43 and globally bounded Jacobian approach⁹ have been used to develop the observer. In ref. 10, a robust decoupling observer considering disturbance is achieved, but the rank condition in terms with the output and input distribution matrix is difficult to meet. To avoid the requirement of rank constraints related to the system matrix, Sedighi et al.¹¹ present a new development for the unknown input observers design, which guarantees the stability of the error closed-loop system and provides designers with a greater degree of freedom in design space. In ref. 12, a controller and fault identification method based on the separation principle is proposed to realize the simultaneous decoupling identification and control. In ref. 13, an adaptive extended state observer with lumped uncertainties and unmeasured states is proposed to estimate the unknown coefficient. However, most of the above observer designs rely on knowing the model information a priori, which poses a strict restriction to real application.

Neural network with good nonlinear approximation capability is very suitable to design the observer of model uncertain systems. The structure of a neutral observer mainly includes two parts, that is, a neural network to identify the unknown nonlinearity and a traditional Luenberger-like observer to estimate the state. In refs. 14 and 15, in the case of imposing strong strict positive real (SPR) conditions on the output error equation, a structure using two independent single layer neural networks is proposed to estimate the state of affine and non-affine SISO nonlinear systems. Furthermore, a nonlinear observer for the MIMO system using a single layer neural network was developed in refs. 16 and 17, where the SPR restrict condition has been weakened to a certain extent. Furthermore, a neural network observer with the modified backpropagation algorithm was proposed in ref. 18, where the SPR condition and any other strong restrictions are canceled. In particular, the differential neural network (DNN),¹⁹ incorporating the feedback design, provides a more efficient way to solve the state estimation problem of model uncertainty systems. In ref. 20, a DNN observer with sliding mode updating rule was reported and the relevant observational conditions have been removed. New passivity analysis of DNN proposed by Xiao et al.²¹ illustrates that the boundedness of external input can ensure the boundedness of the input–output signals for each block of closed-loop error dynamics, which therein avoids the requirement of persistency excitation (PE) condition.

Recent studies have shown that the design of a self-learning controller based on approximate dynamic programming (ADP) can avoid the requirement of model accuracy and achieve optimal control at the same time.^22,23 As we all know, the curse of dimensionality of dynamic programming and offline learning characteristics make the existing optimal control strategies based on dynamic programming inefficient. ADP is inspired by biological systems, which can greatly improve the computing power for solving optimal control problems, but it will not bring obvious approximation errors. Since Werbos²⁴ introduced a commonly used actor-critic (AC) scheme for self-learning control design, various modifications to ADP have been developed, such as action dependent heuristic dynamic programming (ADHDP),²⁵ heuristic dynamic programming (HDP),²⁶ dual heuristic programming (DHP),²⁷ and Q-learning.²⁸ The natural recursive characteristics of ADP are very suitable for solving discrete systems, and its related results cannot be directly used for reference in the continuous field. Ref. 29 proposes a self-learning control method for continuous nonlinear systems, where an identifier is used to identify unknown nonlinear dynamics, and then the online self-learning rate is obtained by approximately solving HJB through an evaluation network. However, it is limited to nonlinear affine system. As claimed in refs. 30 and 31, self-learning control method with the single network adaptive critic (SNAC) scheme has been proved to be an effective method to solve the HJB online and obtain the optimal control action.

It should be pointed that most of the above-mentioned self-learning control research are based on the premise that the state is fully known, which is a strong constraint condition, because in most cases the system state is not completely obtainable, such as the suspension system to be studied in this paper. The main contributions of this paper can be summarized as:

(1) It is the first time that the self-learning control of model uncertain suspension system using only the input and output of the system is investigated. Unlike the commonly used LQR controller design depending on the system model, this paper introduces the observer-based self-learning controller design without prior knowledge of the system dynamics, which makes the proposed method become more suitable for practical application.

(2) The investigated algorithm is implemented based on the observer–critic framework. First, a DNN observer with a reasonably designed weight online update law is used to identify unknown system dynamics based on the known system input/output information. Then, the self-learning control policy is derived with the help of SNAC method. With introducing two novel tuning laws for the DNN observer and the SNAC, the improved performance in terms of adaptability and robustness is realized compared with the general ADP-based self-learning method.

(3) In addition, the entire learning process of the proposed method was updated online, and the stability of the entire closed-loop system was guaranteed by properly designed composite Lyapunov method. The model free and self-learning characteristics of the designed self-learning control method make it possible to develop a good performance active suspension controller that can avoid the influence of uncertainty. The simulation results on a quarter car suspension model encountered unknown road input are presented to demonstrate the effectiveness of the designed self-learning control method.

The overall structure of this article is organized as follows. Section 2 presents the formulation of the question. Section 3 illustrates the DNN observer design process. Observer-based self-learning control method for a quarter car active suspension system is developed in the Section 4. Simulation results carried on a quarter car suspension system with time varying road input are given in Section 5. Finally, the conclusions are concluded in Section 6.

Problem formulation

As we all know, the quarter car suspension model as shown in Figure 1 is commonly used in the design of active suspension control systems.³² The state space equation considering the time varying parameters of a quarter car active suspension system can be expressed as

\begin{array}{l} \dot{x} = (A + Δ A) x + (B + Δ B) u + L z_{\dot{r}} \\ y = C x \end{array}

(1)

where suspension state vector is denoted as

x = {[x_{1}, x_{2}, x_{3}, x_{4}]}^{T}

, suspension deflection is denoted as

x_{1} = z_{s} - z_{u}

, sprung mass velocity is denoted as

x_{2} = {\dot{z}}_{s}

, tire deflection is denoted as

x_{3} = z_{u} - z_{r}

, unsprung mass velocity is expressed as

x_{4} = {\dot{z}}_{u}

k_{s}

is suspension stiffness,

b_{s}

represents suspension damper, active control input is express as

u = f_{a}

m_{s}

is sprung mass which is equal to a quarter of the entire vehicle mass,

m_{u}

denotes unsprung mass,

k_{u}

denotes vertical stiffness of the tire,

z_{s}

is vertical displacement of sprung mass,

z_{u}

denotes vertical displacement of unsprung mass,

z_{r}

denotes vertical displacement of the unknown road,

Δ A

and

Δ B

represent the uncertainties caused by the uncertainty parameter m _s and unknown input

z_{r}

A = [\begin{matrix} 0 1 0 - 1 \\ - \frac{k_{s}}{m_{s}} - \frac{b_{s}}{m_{s}} 0 \frac{b_{s}}{m_{s}} \\ 0 0 0 1 \\ \frac{k_{s}}{m_{u}} - \frac{b_{s}}{m_{u}} - \frac{k_{u}}{m_{u}} \frac{b_{s}}{m_{u}} \end{matrix}]

Figure 1.

Diagram of a quarter car suspension model.

The requirements of active suspension system controller design should meet the following three aspects

1) The first main aspect is to ensure high-performance riding quality. This task aims to weaken the vibration force transmitted from unsprung mass to sprung mass through an advanced control approach, which is achieved by minimizing the sprung mass acceleration in the face of parameter uncertainties caused by sprung mass and time varying road input.

2) In addition, the wheel should maintain firm and uninterrupted contact with the road surface, and the normal tire load change related to the vertical deflection of the tire should be small to ensure good road holding performance.

3) Finally, the suspension space constraint should be less than the maximum suspension deflection $z_{\max}$ , that is, $| z_{s} - z_{u} | \leq z_{\max}$ .

The goal of active suspension design is to find a control policy that can not only keep the system state stable, but also minimize the required performance indicator, such that

V = \frac{1}{2} \int_{0}^{t} (x^{T} Q x + u^{T} R u) d τ

(2)

where the parameters

Q

and

R

represent different weights with respect to the state and control input, they are selected by the designer.

Definition 1

As claimed by Liews,³³ if $u$ is continuous on $Ω$ , $u (0) = 0$ , $u$ stabilizes (1) on $Ω$ , and $\forall x_{0} \in Ω$ , $V (x_{0})$ is finite, then the control input $u$ is called to be admissible with respect to (1) on compact set $Ω \in R^{n}$ and is denoted by $u \in ψ (Ω)$ .

With the help of knowledge in the field of optimal control, the Hamiltonian of (1) can be defined as

\begin{matrix} H (x, u, V) = V_{x}^{T} [(A + Δ A) x + (B + Δ B) u \\ + L z_{r} + r (x, u)] \end{matrix}

(3)

where

V_{x} \overset{Δ}{=} \frac{\partial V (x)}{\partial x}

denotes the partial derivative of the cost function

V (x)

with respect to

x

The nominal optimal valued function $V^{*} (x)$ is expressed as

V^{*} (x) = \min_{u \in ψ (Ω)} \int_{t}^{\infty} r (x (τ), u (τ)) d τ

(4)

Meanwhile, equation (4) should satisfy the following HJB equation

\begin{matrix} 0 = \min_{u \in ψ (Ω)} [H (x, u, V_{x}^{*})] = V_{x}^{* T} [(A + Δ A) x \\ + (B + Δ B) u + L z_{r} + r (x, u) \end{matrix}

(5)

where

V_{x}^{*} \overset{Δ}{=} \frac{\partial V^{*} (x)}{\partial x}

. Assume that the minimum of equation (5) uniquely exists, self-learning control policy

u^{*}

can thus be obtained by solving

\partial H (x, u, V_{x}^{*}) / \partial u = 0

, such that

u^{*} = - \frac{1}{2} R^{- 1} {(\frac{\partial [(A + Δ A) x + (B + Δ B) u]}{\partial u})}^{T} V_{x}^{*}

(6)

Then, the HJB equation (5) can be further expressed with respect to equation (6) as follows

\begin{matrix} 0 = \min_{u \in ψ (Ω)} [H (x, u^{*}, V_{x}^{*})] = V_{x}^{* T} [(A + Δ A) x \\ + (B + Δ B) u^{*} + L z_{r} + r (x, u^{*}) \end{matrix}

(7)

It can be seen that the nominal value function $V^{*} (x)$ should be known in advance to obtain the self-learning control policy $u^{*}$ in equation (6). However, the analytical solution of partial differential equation (7) cannot be obtained due to the unknown state, model uncertainty and unknown road input factors existed in the suspension system. This paper devotes to solve this problem by establishing an observer–critic framework via the single network adaptive critic method in the following two steps:

1) Based on the input/output information from the suspension system, an adaptive DNN observer with the weight updating law was established to observe the unknown state.

2) Based on the observations, a self-learning controller with the observer critic structure is developed via the single network adaptive critic approach.

Remark 1

The common LQR controller designs^34–36 are often based on formula (2) where all parameters are available in advance. In addition, the corresponding feedback control law is obtained by solving the Riccati equation offline. The disadvantage of this method is that the control gain cannot be updated online according to the system uncertainties caused by m _s and unknown road displacement z _r. This will inevitably lead to unsatisfactory control performance. Therefore, the design of active suspension systems is expected to introduce a self-learning intelligent control method that can adapt to various driving conditions.

The differential neural network observer

Consider the following DNN to identify nonlinear system (1), such that

\dot{x} = A x + W σ {[(x^{T}, u^{T})]}^{T} + ξ

(8)

where

W \in R^{4 \times 4}

represents the unknown nominal matrix,

σ (\cdot)

denotes the nonlinear activation function which is commonly chosen as a sigmoidal function

σ (\cdot) = \frac{α}{(1 + e^{- β x}) - γ}

with the properly selected parameters

α, β, γ

ξ

is regarded as modeling error.

Assumption 1 . The nominal weight $W$ and the approximate error vector $ξ$ are bounded by

{‖ W ‖}_{F} \leq \bar{W}, {‖ ξ ‖}_{F} \leq \bar{ξ}

(9)

Fact 1. Garces³⁷ As we all know, the nonlinear activation function $σ (\cdot)$ satisfies the generalized local Lipschitz conditions, such that

\begin{array}{l} ‖ σ ({[x^{T}, u^{T}]}^{T}) - σ ({[{\hat{x}}^{T}, u^{T}]}^{T}) ‖ \leq λ_{σ} ‖ x ‖, (λ_{σ} > 0) \\ 0 < σ ({[x^{T}, u^{T}]}^{T}) \leq \bar{σ} \end{array}

(10)

Considering the following the neuro-observer

\begin{array}{l} \dot{\hat{x}} = A \hat{x} + \hat{W} σ ({[{\hat{x}}^{T}, u^{T}]}^{T}) + K (C x = C \hat{x}) \\ \hat{y} = C \hat{x} \end{array}

(11)

where

\hat{x}

denote the observation state, the observation gain

K

is obtained by solving the following stable equation

{(A - K C)}^{T} P + P (A - K C) + 2 α P = - Q

(12)

where

Q

P

denote positive definite matrices and parameter

α

will be explained in the subsequent proof analysis.

The state observation error is expressed as

\tilde{x} = x - \hat{x}

(13)

From (8) and (11), the equation in terms with the observation error is derived as

\begin{matrix} \dot{\tilde{x}} = (A - K C) \tilde{x} + W σ ({[x^{T}, u^{T}]}^{T}) + ξ - \hat{W} σ ({[{\hat{x}}^{T}, u^{T}]}^{T}) \\ = (A - K C) \tilde{x} + W {σ ({[x^{T}, u^{T}]}^{T}) - σ ({[{\hat{x}}^{T}, u^{T}]}^{T})} \\ + ξ + \tilde{W} σ ({[{\hat{x}}^{T}, u^{T}]}^{T}) \\ \tilde{y} = C \tilde{x} \end{matrix}

(14)

where

\tilde{W} = W - \hat{W}

Once the architecture of the DNN observer is determined, the next step is to design appropriate update rules to implement online learning. There are generally the following two design ideas

1) Based on commonly used learning rules, such as back propagation algorithm, the online learning rate of DNN observer is designed, and then suitable candidate of Lyapunov function is designed to prove the convergence characteristics of the system.^11,15

2) In order to ensure the stability of the closed-loop system, the online learning rate is designed by defining the quadratic function which is related to the weight error and the observation error, and the time derivative of Lyapunov function is proved to be negative.^38,39

The main drawback of the previous work in the first way is the approximate treatment of the dynamic backpropagation problem by using the gradient approximation, which inevitably leads to parameter overflow problems. Therefore, we choose the second way to develop the updating law for the propose DNN observer in this paper.

Theorem 1

The uncertain suspension model (1) is identified by the DNN observer model (11) with the following updating law

\dot{\hat{W}} = \frac{- L P C^{T} \tilde{y} σ^{T} ({\hat{x}}^{T}, u^{T}) - λ ‖ {\tilde{y}}^{T} C P ‖ L \hat{W}}{{‖ C ‖}^{2}}

(15)

then the state estimation error and the DNN weight error is UUB. That is,

\tilde{x} \in L_{\infty},

\tilde{W} \in L_{\infty},

where

P

is the solution of equation (12),

L

is positive definite matrix,

λ

is a positive constant.

Proof. The weight updating law in (15) is derived by properly selecting the following quadratic function related to the observation error and the weight error, such that

L_{I} = \frac{1}{2} {\tilde{x}}^{T} P \tilde{x} + \frac{1}{2} t r {{\tilde{W}}^{T} L^{- 1} \tilde{W}}

(16)

The time derivative of $L_{I}$ can be obtained by using the error dynamic equation (14), such that

\begin{matrix} {\dot{L}}_{I} = \frac{1}{2} {\tilde{x}}^{T} {P (A - K C) + {(A - K C)}^{T} P} \tilde{x} \\ + {\tilde{x}}^{T} P W^{*} {σ ({[x^{T}, u^{T}]}^{T}) - σ ({[{\hat{x}}^{T}, u^{T}]}^{T})} \\ + {\tilde{x}}^{T} P ξ + {\tilde{x}}^{T} P \tilde{W} σ ({[{\hat{x}}^{T}, u^{T}]}^{T}) + t r {{\dot{\tilde{W}}}^{T} L^{- 1} \tilde{W}} \end{matrix}

(17)

Since the DNN weight is updated by (15) and satisfies the inequality $t r {{\tilde{W}}^{T} \hat{W}} < ‖ \tilde{W} ‖ \bar{W} - {‖ \tilde{W} ‖}^{2}$ , then based on Assumption 1 and equation (16), ${\dot{L}}_{I}$ becomes

\begin{matrix} {\dot{L}}_{I} \leq \frac{1}{2} {\tilde{x}}^{T} {P (A - K C) + {(A - K C)}^{T} P} \tilde{x} \\ + {\tilde{x}}^{T} P W \tilde{σ} ({[x^{T}, u^{T}]}^{T}) + {\tilde{x}}^{T} P \tilde{W} \hat{σ} ({[{\hat{x}}^{T}, u^{T}]}^{T}) \\ + {\tilde{x}}^{T} P ξ + λ ‖ {\tilde{x}}^{T} ‖ ‖ P ‖ (‖ \tilde{W} ‖ \bar{W} - {‖ \tilde{W} ‖}^{2}) \\ \leq \frac{1}{2} {\tilde{x}}^{T} {P (A - K C) + {(A - K C)}^{T} P} \tilde{x} \\ + λ ‖ {\tilde{x}}^{T} P ‖ \bar{W} ‖ \tilde{x} ‖ + ‖ {\tilde{x}}^{T} P ‖ ‖ \tilde{W} ‖ \bar{σ} \\ + λ ‖ {\tilde{x}}^{T} ‖ ‖ P ‖ (‖ \tilde{W} ‖ \bar{W} - {‖ \tilde{W} ‖}^{2}) + {\tilde{x}}^{T} P ξ \end{matrix}

(18)

Define $α = λ_{σ} \bar{W}$ and select proper $Q$ to satisfy the condition in (12), then we can get

{\dot{L}}_{I} \leq - \frac{1}{2} λ_{\min} (Q) {‖ \tilde{x} ‖}^{2} + ‖ \tilde{x} ‖ ‖ P ‖ (ξ + \frac{μ^{2}}{4 λ})

(19)

where

μ = ‖ λ \bar{W} + \bar{σ} ‖

The negative definiteness of ${\dot{L}}_{I}$ can been guaranteed if

‖ \tilde{x} ‖ > \frac{2 ‖ P ‖ (ξ + \frac{μ^{2}}{4 λ})}{λ_{\min} (Q)}

(20)

Thus ${\dot{L}}_{I} < 0$ is guaranteed, which demonstrates the UUB of $‖ \tilde{x} ‖, ‖ \tilde{W} ‖$ by using the Lyapunov theory. The correctness of Theorem 1 is thus guaranteed.

Remark 2

Since we can select $L, λ$ in (15) arbitrarily, the updating gains of DNN observer is not limited to the value of $P$ which is obtained by solving the Riccati equation (12). Hence, one can select $Q$ to make the state observation error in equation (20) as small as possible.

Observer-based self-learning controller design

This section is mainly devoted to the design of a self-learning control policy with the aid of the above-mentioned DNN observer using the single network adaptive critic approach.

The following single critic neural network is used to approximate the optimal value function, such that

V^{*} (\hat{x}) = W_{c}^{T} ψ (\hat{x}) + ξ_{c}

(21)

where

W_{c} \in R^{Ι}

is the nominal weight vector, the activation function

ψ (x) \in R^{I}

is usually selected as the sigmoid nonlinear function, I is the number of neurons and

ξ_{c}

denotes the approximation error.

The derivative of the optimal value function (21) in terms of $\hat{x}$ is

V_{\hat{x}}^{*} = \nabla ψ^{T} (\hat{x}) W_{c} + \nabla ξ_{c}

(22)

where

\nabla ψ (\hat{x}) = \frac{\partial ψ (\hat{x})}{\partial \hat{x}}

and

\nabla ξ_{c} = \frac{\partial ξ_{c}}{\partial \hat{x}}

represent the partial derivative of

ψ (\hat{x})

and

ξ_{c}

in terms with

\hat{x}

, respectively.

By substituting (22) into (7), the self-learning control policy is derived as

\begin{matrix} u^{*} = - \frac{1}{2} R^{- 1} {(\frac{\partial f [(A + Δ A) \hat{x} + (B + Δ B) u + L z_{r}]}{\partial u})}^{T} \\ \nabla ψ^{T} (\hat{x}) W_{c} + \nabla ξ_{c} \end{matrix}

(23)

Then, the approximate self-learning control is represented as

\begin{matrix} u = - \frac{1}{2} R^{- 1} {(\frac{\partial f [(A + Δ A) \hat{x} + (B + Δ B) u + L z_{r}]}{\partial u})}^{T} \\ \nabla ψ^{T} (\hat{x}) {\hat{W}}_{c} \end{matrix}

(24)

Next, after substituting equation (22) into equation (6), the HJB equation can be further obtained as follows

\begin{matrix} 0 = {(\nabla ψ^{T} (\hat{x}) W_{c} + \nabla ξ_{c})}^{T} [(A + Δ A) \hat{x} \\ + (B + Δ B) u + L z_{r}] + r (\hat{x}, u) \end{matrix}

(25)

The expression $(A + Δ A) \hat{x} + (B + Δ B) u + L z_{r}$ in (25) is replaced by the state derivative $\dot{\hat{x}}$ of the DNN observer proposed in equation (11), so we have

0 = W_{c}^{T} \nabla ψ (\hat{x}) \dot{\hat{x}} + r (\hat{x}, u) + ξ_{H J B}

(26)

where

X = \nabla ψ (\hat{x}) \dot{\hat{x}}, Y = r (\hat{x}, u)

Remark 3

Our previous study⁴⁰ show that, compared with the traditional observer design driven by observation error, the adaptive learning law including parameter error information can make the closed-loop system converge as soon as possible. Inspired by this result, the following analysis will be devoted to develop a novel learning law for critic neural networks driven by parameter errors, rather than the existed least square³³ or gradient methods.⁴¹

The auxiliary variables $X$ and $Y$ are defined, $Y_{f}$ as

{\begin{cases} η {\dot{Y}}_{f} + Y_{f} = Y, Y_{f} (0) = 0 \\ η {\dot{X}}_{f} + X_{f} = X, X_{f} (0) = 0 \end{cases}

(27)

Further, another two auxiliary regression variables G and H are designed as

{\begin{cases} \dot{G} (t) = - η G (t) + X_{f} X_{f}^{T} G (0) = 0 \\ \dot{H} (t) = - η H (t) + X_{f} [\frac{(Y - Y_{f})}{η}] H (0) = 0 \end{cases}

(28)

By solving the differential equation shown in (28), the following expression can be obtained

{\begin{cases} G (t) = \int_{0}^{t} e^{- η (t - r)} X (r) X^{T} (r) d r \\ H (t) = \int_{0}^{t} e^{- η (t - r)} X (r) \frac{(Y - Y_{f})}{η} d r \end{cases}

(29)

In the end, the learning rule of ${\hat{W}}_{c}$ is designed as

{\dot{\hat{W}}}_{c} = - μ J

(30)

where

μ

represents the adaptive updating gain and the filtered vector J is defined as

J = G (t) {\hat{W}}_{c} + H (t)

Theorem 2

A control mechanism composed of a DNN observer (11) with an update law (15) and a self-learning control (23) with a learning law (30) is used to control the system (1), then all the signals $\tilde{x}$ , $\tilde{W}$ , ${\tilde{W}}_{c}$ implied in the entire learning process consisting of the DNN observer, the SNAC and the self-learning control policy are UUB.

Proof. Let us design the following composite Lyapunov function

L = L_{I} + L_{c}

(31)

where

L_{I}

is defined as in equation (16) and

L_{c}

is defined as

\begin{array}{l} L_{c} = L_{c 1} + L_{c 2} \\ L_{c 1} = \frac{1}{2} {\tilde{W}}_{c}^{T} μ^{- 1} {\tilde{W}}_{c} \\ L_{c 2} = Γ x_{1}^{T} x_{1} + κ V (x) \end{array}

(32)

From (29), we have

J = G (t) W_{c} + H (t) = - G (t) {\tilde{W}}_{c} + ς_{f}

(33)

where

ς_{f} = - \int_{0}^{t} e^{- η (t - r)} X_{f} ξ_{H J B f} d r

is assumed to be bounded, that is,

‖ ς_{f} ‖ \leq {\bar{ς}}_{f}

As claimed by Na,⁴² if the X satisfies the persistently excited (PE) condition, then the matrix $G (t)$ defined in (29) is positive, that is, $λ_{\min} (E) > σ > 0$ . Therefore, according to the equation ${\dot{\tilde{W}}}_{c} = - {\dot{W}}_{c}$ , the derivative of ${\dot{L}}_{c 1}$ is expressed as

\begin{matrix} {\dot{L}}_{c 1} = {\tilde{W}}_{c}^{T} μ^{- 1} {\dot{\tilde{W}}}_{c} = - E (t) {\tilde{W}}_{c}^{T} {\tilde{W}}_{c} + {\tilde{W}}_{c}^{T} ς_{f} \\ \leq - ‖ {\tilde{W}}_{c} ‖ (σ ‖ {\tilde{W}}_{c} ‖ - {\bar{ς}}_{f}) \end{matrix}

(34)

Based on the well-known Cauchy’s inequality, that is, $a b \leq a^{2} δ / 2 + b^{2} / 2 δ$ with $δ > 0$ , (34) is rewritten as

{\dot{L}}_{c 1} \leq - (σ - \frac{1}{2 δ}) {‖ {\tilde{W}}_{c} ‖}^{2} + \frac{δ {\bar{ς}}_{f}^{2}}{2}

(35)

From (8) and (32), we have

\begin{matrix} {\dot{L}}_{c 2} = 2 Γ x^{T} (A x + W σ ({[x^{T}, u^{T}]}^{T}) + ξ) + κ V (x) \\ \leq Γ (2 ‖ A ‖ + 2) {‖ x ‖}^{2} + Γ {‖ W σ ({[x^{T}, u^{T}]}^{T}) ‖}^{2} \\ + {‖ ξ ‖}^{2} - Γ κ λ_{\min} (Q) {‖ C ‖}^{2} {‖ x ‖}^{2} - Γ κ λ_{\min} (R) {‖ u ‖}^{2} \\ \leq - [Γ κ λ_{\min} (Q) {‖ C ‖}^{2} - Γ (2 ‖ A ‖ + 2)] {‖ x ‖}^{2} \\ + [Γ κ λ_{\min} (Q) {‖ C ‖}^{2} - ε^{- 1} Γ (2 ‖ B ‖ + 2)] {‖ x ‖}^{2} \\ - Γ κ λ_{\min} (R) {‖ u ‖}^{2} + Γ {\bar{W}}^{2} {\bar{σ}}^{2} \end{matrix}

(36)

Then the derivative of $\dot{L}$ can be obtained from (16), (35), and (36), such that

\begin{matrix} \dot{L} \leq - (σ - \frac{1}{2 δ}) {‖ {\tilde{W}}_{c} ‖}^{2} - [Γ κ λ_{\min} (Q) {‖ C ‖}^{2} \\ - Γ (2 ‖ A_{n n} ‖ + 2)] {‖ x ‖}^{2} - Γ κ λ_{\min} (R) {‖ u ‖}^{2} \\ - \frac{1}{2} λ_{\min} (Q) {‖ \tilde{x} ‖}^{2} + ‖ \tilde{x} ‖ ‖ P ‖ (ξ + \frac{{‖ μ ‖}^{2}}{4 λ}) + ϖ \end{matrix}

(37)

where

ϖ = Γ {\bar{W}}^{2} {\bar{σ}}^{2} + ε^{- 1} Γ {\bar{W}}^{2} {\bar{σ}}^{2} + \frac{δ {\bar{ς}}_{f}^{2}}{2}

is a positive constant.

With the assumption that the following conditions hold

\begin{array}{l} κ > \frac{Γ (2 ‖ A_{n n} ‖ + 2)}{λ_{\min} (Q) {‖ C ‖}^{2}} \\ ‖ {\tilde{W}}_{c} ‖ > \sqrt{\frac{ϖ}{(σ - \frac{1}{2 δ})}} \\ ‖ \tilde{x} ‖ > \frac{2 ‖ P ‖ (ξ + \frac{{‖ μ ‖}^{2}}{4 λ})}{λ_{\min} (Q)} \end{array}

(38)

Then, $\dot{L} < 0$ is proved. Hence, we concluded that UUB stability in terms with the DNN weight error $\tilde{W}$ and critic NN weight error ${\tilde{W}}_{c}$ implied in the whole closed-loop systems are guaranteed. Thus, Theorem 2 is proved.

Based on the above analysis, the mechanism of the DNN observer-based self-learning control approach for model uncertain suspension systems is shown in Figure 2.

Figure 2.

Mechanism of the self-learning controller.

Remark 4

A self-learning control method is proposed for uncertain suspension system. The significant advantage of the proposed method is that it gets rid of the limitations of the model, which is achieved by employing an DNN observer to identify the unknown dynamic and system state, where the weight updating law obtained by properly designed Lyapunov function instead of the commonly used backpropagation algorithm.⁴³ Moreover, all the signals implied in the entire learning process are UUB.

Based on the above analysis, the flowchart of the proposed control algorithm is depicted in Figure 3 and can be summarized as follows.

Step 1. Select the proper initial values of active functions $σ (.)$ , observation gain K in equation (11) and updating gains $L, λ$ in equation (15) for the observer. $σ (.)$ is usually selected as the sigmoidal function $σ (.) = a / (1 + e^{- b x}) - c$ where a, b, c are the designed constants. W and W _c are tuned online according equations (15) and (30). Hence, there is no need to select the initial values of W and W _c. Meanwhile, select the proper function $ψ (\cdot)$ in equation (21) and the updating gain $μ$ in equation (30) for the critic SNAC. $ψ (\cdot)$ is usually selected as a smooth function consisting with the different combination between the selected suspension states.

Step 2. Then, the inputs/outputs data of the suspension system is used to train the neural network including the DNN observer in equation (11) and critic NN in equation (21).

Step 3. Finally, self-learning control law expressed in equation (24) is obtained based on the first two steps.

Figure 3.

Flowchart of the proposed control algorithm.

Simulation and analysis

In this section, a car active suspension model platform presented in Section 2 is numerically simulated to verify the effectiveness of the DNN observer designed in Section 3 and the self-learning controller developed in Section 4. Table 1 shows the relevant suspension parameters used in the simulation process.

Table 1.

Suspension nominal parameters.³⁵

Parameters	Value/unite
Sprung mass of a quarter car (m _s)	250 kg
Unsprung mass of a quarter car (m _u)	30 kg
Suspension stiffness (k _s)	15000 N/m
Vertical stiffness of the tire (k _t)	15000 N/m
Suspension damper (b _s)	1000 N*s/m
Vertical displacement of sprung mass (z _s)	M
Vertical displacement of unsprung mass (z _u)	m
Vertical displacement of unsprung mass (z _r)	m

In order to verify the robustness of the proposed method, we adopt the following random road input model as following

{\dot{z}}_{r} (t) = - 2 π f_{0} z_{r} (t) + 2 π \sqrt{G (n_{0}) v_{s} ω (t)}

(39)

where

f_{0}

denotes the cut-off frequency,

G (n_{0})

represents the pavement power spectral density under the reference spatial frequency

n_{0}

v_{s}

represents the driving speed,

ω (t)

is a uniformly distributed white noise with a mean value of 0 and an intensity of 1.

Here, related parameters in (39) are chosen by $f_{0} = 0.07 H z, G (n_{0}) = 0.000256$ .

The following smooth function is used to approximate the optimal value function of the self-learning controller, such that

\begin{matrix} V (x) = W_{c 1} x_{1}^{2} + W_{c 2} x_{1} x_{2} + W_{c 3} x_{1} x_{3} \\ + W_{c 4} x_{1} x_{4} + W_{c 5} x_{2} x_{3} + W_{c 6} x_{2} x_{4} + W_{c 7} x_{3} x_{4} \\ + W_{c 8} x_{2}^{2} + W_{c 9} x_{3}^{2} + W_{c 10} x_{4}^{2} \end{matrix}

(40)

where the weight vector of the SNAC is represented by

W_{c} = {[W_{c 1}, W_{c 2}, W_{c 3}, W_{c 4}, W_{c 5}, W_{c 6}, W_{c 7}, W_{c 8}, W_{c 9}, W_{c 10}]}^{T}

Random uncertain road input considering pavement roughness as showed in (39) is performed in this paper to compare the effectiveness of the proposed control method with the LQR control method.⁴⁴ Moreover, time varying parameter of the longitudinal velocity is also considered in the simulations, such that

Case A. Simulations results with the road input (39) and velocity 20 km/h are illustrated in Figures 4 to 7. The simulation results indicated that the proposed self-learning control approach is more effective than the LQR control suspension, the ride quality has been improved observably with respect to the suspension deflection, sprung mass acceleration, tire deflection and unsprung mass velocity, which means the suspension system can be brought back to the stable state as fast as possible when encountering unknown road input. The suspension performances in terms with the road hold quality is thus greatly improved.

Figure 4.

The suspension deflection response.

Figure 5.

The sprung mass acceleration response.

Figure 6.

The tire deflection response.

Figure 7.

The unsprung mass velocity response.

Case B: Simulations results with the road input (39) and velocity v = 60 km/h are presented in Figures 8–11. One can easily find that the proposed control method still has better performance with smaller fluctuations when encountering different longitudinal speeds compared with the commonly used LQR control method. These facts further verify the strong robustness and self-adaptive properties of the proposed control method. It should be mentioned that there are two main reasons for the performance improvements with the proposed control method in the simulation results above. First, the control law of the LQR method is based on the accurate suspension model, the control accuracy cannot be guaranteed when subjecting to time varying parameters. However, the proposed control method is not limited to a mathematical model, but based on the inputs/outputs data of suspension system. Second, the feedback control law of the proposed method can be updated online with the time varying parameters like longitudinal speed and random road input, whereas for the LQR method and other model-based methods, the feedback control law is fixed in advance. Therefore, it could be concluded that self-adaptive property of the proposed method provides a more effective solution for the suspension controller design and can greatly enhance the vehicle riding performance.

Figure 8.

The suspension deflection response.

Figure 9.

The sprung mass acceleration response.

Figure 10.

The tire deflection response.

Figure 11.

The unsprung mass velocity response.

To show the control performance of the proposed method, the performance index—Root Mean Square (RMS) for the states error has been adopted for the purpose of comparison.

\begin{array}{l} V (x) = W_{c 1} x_{1}^{2} + W_{c 2} x_{1} x_{2} + W_{c 3} x_{1} x_{3} \\ + W_{c 4} x_{1} x_{4} + W_{c 5} x_{2} x_{3} + W_{c 6} x_{2} x_{4} + W_{c 7} x_{3} x_{4} \\ + W_{c 8} x_{2}^{2} + W_{c 9} x_{3}^{2} + W_{c 10} x_{4}^{2} \end{array}

(41)

where n is number of the simulation steps, $W_{c} = {[W_{c 1}, W_{c 2}, W_{c 3}, W_{c 4}, W_{c 5}, W_{c 6}, W_{c 7}, W_{c 8}, W_{c 9}, W_{c 10}]}^{T}$ is the difference between the state variables with control and without control at i ^th step.

One can also notice from Table 2 that the RMS values of all state variables for the proposed method are smaller than the commonly used LQR method, which further demonstrates the improved performance of the proposed method.

Table 2.

The Root Mean Square Values.

		x₁	x₂	x₃	x₄
Case A	Proposed	0.00476	0.01088	0.00019	0.00912
Case A	LQR	0.01465	0.03029	0.00170	0.14900
Case B	Proposed	0.00269	0.00189	0.00085	0.07610
Case B	LQR	0.01445	0.01295	0.00109	0.001558

The above simulation results using the proposed DNN observer and self-learning controller indicated that all the evaluation indexes of suspension systems have been improved compared with the commonly used LQR control method. The proposed observer-based self-learning mechanism can realize online self-renewal according to the unknown road displacement without the complete suspension model. It therefore concluded that self-learning and model independent characteristics of the developed self-learning control approach opens a new idea for the design of model uncertain suspension control system and can apparently improve the car road hold and ride quality. Therefore, the aim of pursuing a high-performance suspension system is achieved.

Conclusions

This paper presents a new way to realize simultaneous online state observation and control for model uncertain suspension systems. This article has achieved the following main innovations. First, the proposed self-learning control method does not require the complete information of suspension system, this feature makes it become easier to be applied in practical system. Second, the self-learning control policy can be updated online with the unknown road input, which can greatly enhance the suspension performances in terms with the road hold and ride quality. Finally, simulation results performed on a quarter car suspension system considering unknown road input are presented. The suspension responses with respect to suspension deflection, spring mass acceleration, tire deflection and velocity of unsprung mass are presented to validate the effect of the proposed approach. The future work will focus on developing a more practical intelligent controller for automotive suspension systems considering actual state constraints and actuator saturation limits.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (Grant No.62073298), Key Scientific and Technological Project of Henan Province (Grant No. 212102310454 and No. 212102210237) and ZZULI Doctoral Fund for Scientific Research (Grant No. JDG20190099).

ORCID iD

Zhijun Fu

References

Chen

Cao

Yuan

, et al. Observer-based adaptive neural network backstepping sliding mode control for switched fractional order uncertain nonlinear systems with unmeasured state. Meas Control 2021; 54(7–8): 1–14. DOI: 10.1177/00202940211021107

Kim

Kang

Kim

, et al. Simultaneous estimation of state and unknown road roughness input for vehicle suspension control system based on discrete Kalman filter. Proc Imeche D: J Automobile Eng 2020; 234(6): 1610–1622.

Ebrahimi

Nopour

Dabbagh

. Smart laminates with an auxetic ply rested on visco-Pasternak medium: Active control of the system’s oscillation. Engineering with Computers, 2021, DOI: 10.1007/s00366-021-01533-1

Zhang

Wang

Tazeddinova

, et al. Enhancing active vibration control performances in a smart rotary sandwich thick nanostructure conveying viscous fluid flow by a PD controllerWaves in Random and Complex Media, 2021, DOI: 10.1080/17455030.2021.1948627

Alif

Darouach

Boutayeb

. Design of robust reduced-order unknown-input filter for a class of uncertain linear neutral system. IEEE Trans Automatic Control 2010; 55(1): 6–19.

Zemouche

Boutayeb

. On LMI conditions to design observers for Lipschitz nonlinear systems. Automatica 2013; 49: 585–591.

Kalsi