Sage Journals: Discover world-class research

Abstract

This paper focuses on the robust parameters estimation algorithm of linear parameters varying (LPV) models. The classical robust identification techniques deal with the polluted training data, for example, outliers in white noise. The paper extends this robustness to both symmetric and asymmetric noise with outliers to achieve stronger robustness. Without the assumption of Gaussian white noise pollution, the paper employs asymmetric Laplace distribution to model broader noise, especially the asymmetrically distributed noise, since it is an asymmetric heavy-tailed distribution. Furthermore, the asymmetric Laplace (AL) distribution is represented as the product of Gaussian distribution and exponential distribution to decompose this complex AL distribution. Then, a shifted parameter is introduced as the regression term to connect the probabilistic models of the noise and the predict output that obeys shifted AL distribution. In this way, the posterior probability distribution of the unobserved variables could be deduced and the robust parameters estimation problem is solved in the general Expectation Maximization algorithm framework. To demonstrate the advantage of the proposed algorithm, a numerical simulation example is employed to identify the parameters of LPV models and to illustrate the convergence.

Keywords

System identification expectation maximization algorithm asymmetric Laplace distribution

Introduction

The modern industry is developing fast, and systems of various industrial processes become larger and the internal relations become more complex.¹ Establishing accurate processes’ models is the foundation of control, fault diagnosis and condition monitoring, and therefore higher requirements are put forward for the precision of modeling.^2–4 As a result, many existing researches focus on the complex process modeling techniques with complex structure, strong dynamic characteristics and strong nonlinearity in modern industrial process control.⁵ Linear parameter varying (LPV) systems attract much attention since this has a simple linear structure and variable parameters, and therefore can model complex and time-varying processes. In addition, based on LPV systems, many controllers have been studied to achieve stable and robust control.^6,7

Because of its simple structure, the least square estimation method can be employed to identify the parameters of the most classical LPV models. However, this requires high quality training data, which is almost impossible in practice. As for the state space LPV models, the various parameters identification problems have been studied. For instance,⁸ proposed the parameters estimation algorithm to identify state space LPV models and the identification approach of multiple-input single output systems with time-delay was proposed in Yang et al.⁹ In addition, to reduce computational cost, Turk, Gillis, Pipeleers & Swevers proposed the regularization method, and used $l_{2, 1}$ norm regularization to identify the LPV systems.

As for the single input and single output systems, many research focuses on the robust identification problems and addressing various challenges posed by the training data.^10–12 For example, sensors hardly collect all the information roundly and accurately. Therefore, the algorithm that can estimate the parameters when output measurements are randomly missing is proposed in Yang et al.¹³ and Liu et al.¹⁴ Output-error (OE) structure is employed in Yang et al.¹⁵ as the linear structure of LPV models that has the advantages and flexibility to model various noise, and proposed a robust parameters estimation method to identify the parameters when the training data is polluted by outliers and the system has time delays. Moreover, when it comes to the LPV piecewise systems,a membership recognition method is proposed in Hamdi et al.¹⁶ In practice, because of the different working environments, different types of sensors are usually employed and therefore the sampling frequencies may be also different. To solve this, the dual-rate parameters estimation approach is proposed in Yang and Yin¹⁷ in which the LPV systems can be identified under the expectation maximization framework. Among these researches, Bayesian approach is of significance, and therefore the probabilistic models definitely determine the identification ability. For example, compared with Gaussian distribution, Laplace distribution and Student-T distribution are employed to model the noise since they are heavy-tailed distributions, and therefore the system identification algorithms developed based on these are robust to the outliers.¹⁸ However, heavy-tailed distributions are not perfect since they can only model the outliers compared with Gaussian distribution. In practice, there are not only outliers but also all kinds noise in the training data, for example, asymmetric noise. To be more specific, these probability distributions are symmetric about the vertical axis ( $x = 0$ ), and therefore cannot model the asymmetric noise that is common in practice.

The work based on the asymmetric distribution can model both the asymmetric and symmetric noise. Compared with Gaussian distribution, the asymmetric Laplace distribution is a heavy-tailed distribution. Thus, the algorithm studied based on this is robust to outliers. In addition, asymmetric Laplace distribution is different from Laplace distribution. Specifically, Laplace distribution is a special type of asymmetric Laplace distribution, the symmetric form. Thus, asymmetric Laplace distribution can model more general noise and therefore has stronger robustness. However, the application of asymmetric Laplace is still challenging because of the complex structure. Therefore, the expectation maximization (EM) algorithm framework is employed, and the LPV system identification algorithm can be deduced. To sum up, the contribution of the paper is summarized as follow:

A more robust parameters estimation method of LPV models is proposed. The algorithm identifies the parameters when the training data are polluted by both symmetric/asymmetric noise with outliers.

The noise is modeled by a more general probabilistic model, asymmetric Laplace distribution, which can both model the symmetric and asymmetric probabilities. In order to utilize the asymmetric Laplace distribution in parameters estimation, the asymmetric Laplace distribution with a shifted parameter is represented as the product of an exponential distribution and a normal distribution, and the posterior probability distribution of the random variable that obeys the exponential distribution is deduced.

The EM algorithm is utilized to deduced the parameters estimation algorithm. The random variable that obeys the exponential distribution is treated as the latent variable in the EM algorithm, and therefore the asymmetric Laplace distribution can be possible to model the noise.

The rest part of the paper is organized as follow. The LPV system identification problem is described in Section II, and the properties of asymmetric Laplace distribution are also introduced. In Section III, the parameters’ estimation algorithm of LPV models is deduced under the EM algorithm framework, and the EM algorithm is briefly introduced at the beginning. To illustrate the robustness and advantages of the proposed algorithm, the numerical example is present in Section IV. Finally, the conclusion is given in Section V.

Problem formulation

The LPV–OE model

The proposed algorithm focuses on the identification of LPV models, and the local models are output error (OE) models. The LPV–OE models are defined as equation (1).

\begin{matrix} x_{k} = G (w_{k}, z^{- 1}) u_{k} \\ y_{k} = x_{k} + Δ_{k} \end{matrix}

(1)

where $y_{k}, u_{k}, w_{k}$ are the input, output and scheduling variable respectively. $z^{- 1}$ is the time operator defined as equation (2).

x_{k - 1} = z^{- 1} x_{k} = G (w_{k}, z^{- 1}) u_{k}

(2)

The transfer function $G (w_{k}, z^{- 1})$ is defined as equation (3).

G (w_{k}, z^{- 1}) = A^{†} (w_{k}, z^{- 1}) B (w_{k}, z^{- 1})

(3)

The polynomial $A (w_{k}, z^{- 1}), B (w_{k}, z^{- 1})$ represent that the parameters of LPV–OE models vary with the scheduling variable $w_{k}$ .

\begin{matrix} A (w_{k}, z^{- 1}) & = 1 + \sum_{i = 1}^{n_{a}} a_{i} (w_{k}) z^{- i} \\ B (w_{k}, z^{- 1}) & = \sum_{j = 1}^{n_{b}} b_{j} (w_{k}) z^{- j} \end{matrix}

(4)

Normally, $a_{i} (w_{k}), b_{j} (w_{k})$ are defined by a group of meromorphic functions, $Ξ, Ψ$ , respectively.

\begin{matrix} a_{i} (w_{k}) & = a_{0}^{i} + \sum_{p = 1}^{n_{α}} a_{P}^{i} Ξ_{p} (w_{k}), i = 1, . . ., n_{a} \\ b_{j} (w_{k}) & = b_{0}^{i} + \sum_{q = 1}^{n_{β}} b_{q}^{i} Ψ_{q} (h_{k}), j = 1, . . ., n_{b} \end{matrix}

(5)

where $n_{α}, n_{β}$ are the number of meromorphic functions that are the prior known information. $A^{†} (w_{k}, z^{- 1})$ denotes the adjoint matrix of $A (w_{k}, z^{- 1})$ .

The LPV–OE models that defined by equations (1)–(5) can be rewritten in equations (6)–(8), the autoregressive form.

x_{k} = ϕ_{k}^{T} θ

(6)

where

ϕ_{k} = {[\begin{matrix} - x_{k - 1} - Ξ_{1} (w_{k}) x_{k - 1} \dots - Ξ_{n_{α}} (w_{k}) x_{k - 1} \\ - x_{k - 2} - Ξ_{1} (w_{k}) x_{k - 2} \dots - Ξ_{n_{α}} (w_{k}) x_{k - 2} \\ \dots \dots \dots \dots \\ - x_{k - n_{a}} - Ξ_{1} (w_{k}) x_{k - n_{a}} \dots - Ξ_{n_{α}} (w_{k}) x_{k - n_{a}} \\ u_{k - 1} - Ψ_{1} (w_{k}) u_{k - 1} \dots - Ψ_{n_{β}} (w_{k}) u_{k - 1} \\ u_{k - 2} - Ψ_{1} (w_{k}) u_{k - 2} \dots - Ψ_{n_{β}} (w_{k}) u_{k - 2} \\ \dots \dots \dots \dots \\ u_{k - n_{b}} - Ψ_{1} (w_{k}) u_{k - n_{b}} \dots - Ψ_{n_{β}} (w_{k}) u_{k - n_{b}} \end{matrix}]}^{T}

(7)

θ = {[\begin{matrix} a_{0}^{1} a_{1}^{1} \dots a_{n_{α}}^{1} \\ a_{0}^{2} a_{1}^{2} \dots a_{n_{α}}^{2} \\ \dots \dots \dots \dots \\ a_{0}^{n_{a}} a_{1}^{n_{a}} \dots a_{n_{α}}^{n_{a}} \\ b_{0}^{1} b_{1}^{1} \dots b_{n_{β}}^{1} \\ b_{0}^{2} b_{1}^{2} \dots b_{n_{β}}^{2} \\ \dots \dots \dots \dots \\ b_{0}^{n_{b}} b_{1}^{n_{b}} \dots b_{n_{β}}^{n_{b}} \end{matrix}]}^{T}

(8)

The aim of this paper is to identify the parameters of LPV–OE models, $θ$ , based on the collected input/output data when the training data is polluted by outliers and asymmetrically distributed noise. In order to get the robustness, the noise $Δ_{k}$ in equation (1) is assumed to obey the asymmetric Laplace distribution.

Asymmetric Laplace distribution

The probability density function of a p dimensional random variable that obeys centralized asymmetric Laplace (CAL) distribution is shown in equation (9).

f (v | α, Σ) = \frac{2 \exp {v^{'} Σ^{- 1} α}}{{(2 π)}^{p / 2} | Σ |^{1 / 2}} \times {(\frac{v^{'} Σ^{- 1} v}{2 + α^{'} Σ^{- 1} α})}^{v / 2} K_{v} (u)

(9)

where, $v = (2 - p) / 2$ , $u = \sqrt{(2 + α^{'} Σ^{- 1} α) (v^{'} Σ^{- 1} v)}$ . $Σ$ and $α \in R^{p}$ denote the scale matrix and the skewness respectively. In this paper, symbol $V ~ C A L_{p} (α, Σ)$ is employed to denote that a p dimensional random variable obeys CAL distribution.

However, CAL distribution cannot be utilized in clustering and system identification directly because CAL distribution forces all components connecting at the same region. Thus, by considering the shift parameter $μ \in R^{p}$ with the same dimension to CAL distribution, the random variable X that obeys shifted (non-centralized) asymmetric Laplace (SAL) distribution can be defined as $X = (V + μ)$ . Herein, symbol $X ~ S A L_{p} (α, Σ, μ)$ is employed to denote that p dimension random variable X obeys SAL distribution, and the probability density function is equation (10).

ξ (x | α, Σ, μ) = \frac{2 \exp {{(x - μ)}^{'} Σ^{- 1} α}}{\begin{matrix} {(2 π)}^{p / 2} | Σ |^{1 / 2} \end{matrix}} \times {(\frac{δ (x, μ | Σ)}{2 + α^{'} Σ^{- 1} α})}^{v / 2} K_{v} (u)

(10)

where $u = {((2 + α^{T} Σ^{- 1} α) δ (x, μ | Σ))}^{1 / 2}$ and $δ (x, μ | Σ) = (x - μ)^{T} Σ^{- 1} (x - μ)$ ; $v = (2 - p) / 2$ ; $Σ$ and $α$ are defined as before. The paper emphasizes the identification of the Single Input Single Output system that is one-dimensional problem.

To illustrate the asymmetry of SAL distribution, the probability density function of the SAL distribution with skewness $5$ shift $5$ and scale $100$ is shown in Figure 1.

Figure 1.

The probability density function of the SAL distribution.

However, because of the complexity of SAL distribution, it still cannot be utilized directly like Gaussian distribution.¹⁹ has studied SAL distribution in detail that the random variable $V ~ C A L_{p} (α, Σ)$ can be generated by equation (11).

V = z α + \sqrt{z} Y

(11)

where Z obeys exponential distribution with rate parameters $1$ ; Y is the random variable Y obeys normal distribution with zero mean and variance $Σ$ , immediate $Y ~ N (0, Σ)$ . Z and Y are independent.

Thus, the random variable $X ~ S A L_{p} (α, Σ, μ)$ can be illustrated by adding shifted parameter $μ$ to equation (11).

X = μ + z α + \sqrt{z} Y

(12)

In other words, the random variable that obeys SAL distribution ultimately obeys normal distribution when the random variable z is observable.

X | Z = z ~ N (μ + z α, z Σ)

(13)

In the system identification problem described in Section, the noise obeys the CAL distribution, and therefore, the output obeys the SAL distribution with $μ = x_{k}$ . In addition, when Z in SAL distribution is observable, the output radically obeys the normal distribution and therefore the identification problem can be solved. However, it is impossible to monitor the parameter of noise. According to Bayes formula, the probability density function of z can be represent by equation (14).

\begin{array}{l} f_{Z} (z | X = x) = \frac{f_{X} (x | Z = z) h (z)}{f_{X} (x)} \\ = \frac{z^{v - 1}}{2} {(\frac{δ (x, μ | Σ)}{2 + α^{'} Σ^{- 1} α})}^{- v / 2} \times \frac{\exp {- \frac{1}{2 z} δ (x, μ | Σ) - \frac{z}{2} (2 + α^{'} Σ^{- 1} α)}}{K_{v} (\sqrt{(2 + α^{'} Σ^{- 1} α) δ (x, μ | Σ)})} \end{array}

(14)

It can be seen that equation (14) is fundamentally Generalized inverse Gaussian (GIG) distribution with $a \equiv 2 + α^{'} Σ^{- 1} α$ and $b \equiv δ (x, μ | Σ)$ which has been studied in detail. In this paper, the expectation of the random variable and the reciprocal of the random variable that obeys GIG distribution are useful and are shown in equations (15) and (16).

E [X] = \sqrt{\frac{b}{a}} R_{v} (\sqrt{ab})

(15)

E [1 / X] = \sqrt{\frac{a}{b}} R_{v} (\sqrt{ab}) - \frac{2 v}{b}

(16)

where $R_{v} (z) : = K_{v + 1} (z) / K_{v} (z)$ .

Robust parameters estimation algorithm for LPV–OE systems

Brief introduction of the EM algorithm

The EM algorithm is an iterative method of the maximum likelihood estimation problem and its flexibility to deal with the missing part (hidden state) attracts wide attention. In the EM algorithm, Expectation step (E-Step) and Maximization step (M-Step) run alternatively until convergence. In the E-Step, the conditional expectation of the log likelihood function with respect to the latent state is calculated. In addition, in the M-Step, the conditional expectation that calculated in the E-Step is maximized. Several methods have been studied to achieve it, including the Newton-Raphson method, Derivative analysis, etc. The procedures of the EM algorithm is summarized below:

(1) Initialization. According to the identification target, the complete data set and the parameters to be estimated are divided into observable data set $D_{obs}$ , missing data set $D_{mis}$ and the parameters data $Θ$ . In addition, maximum number of iteration times and expected residual are also defined before the iteration.

(2) E-Step: The conditional expectation of the log likelihood function with respect to the latent state.

Q (Θ | Θ^{l}) = E_{D_{mis} | D_{obs}} {logP (D_{mis}, D_{obs} | Θ)}

(17)

(3) M-Step: Maximize the $Q (Θ | Θ^{l})$ by finding suitable $Θ^{l + 1}$

Q (Θ^{l + 1} | Θ^{l}) > Q (Θ^{l} | Θ^{l})

(18)

(4) Let $Θ^{l} = Θ^{l + 1}$ and running steps (2) and (3) alternatively until the parameters converge.

Mathematical formulation of the robust parameters estimation approach

The data-driven system identification algorithm is proposed in order to estimate the parameters of LPV–OE systems. Compared with mechanism models, the proposed algorithm only utilizes limited prior knowledge and the input-output data to establish the process models. However, there is too much information that need to be identified, for example, the model orders and parameters, the meromorphic functions, etc. The paper emphasizes on the parameters estimation, thus, some information about the structures of LPV–OE models is assumed to be known or fixed a priori.

Assumption 0.1. The structures of the LPV–OE systems are assumed based on the prior information. Thus, the ground truth orders of the LPV–OE models to be identified are $n_{a}$ and $n_{b}$ , and the meromorphic functions to construct the polynomials $A (ω, z^{- 1})$ and $B (ω, z^{- 1})$ are pre-selected.

Remark 1. The paper focus on the parameters estimation algorithm, therefore, the structure information is assumed to be known. To find a suitable structure, especially for some complex processes, identifying the parameters and the structures iteratively is a good way. After the structures have been predefined, the parameters can be estimated by utilizing the proposed algorithm. If the identified LPV–OE model cannot meet the index, we can adjust the structures and identify the parameters repeatedly until the performing is satisfactory.

Under this assumption, the only information to be determined is the parameters of models if the structures of LPV–OE models have been identified or known. Thus, by using the collected input and output data, estimating the parameters of LPV–OE models is the emphasis. In particular, the difficulty of identification depends on the quality of the collected data. In this section, the robust parameters estimation algorithm will be studied and introduced in detail, and the robustness to outliers and asymmetric noise will be illustrated.

As a statistical learning method, the proposed algorithm will be formulated based on the EM algorithm framework. The data-driven parameters estimation algorithm identifies the parameters of LPV–OE systems by extracting the features from input ${u_{k}}_{k = 1 : L}$ output ${y_{k}}_{k = 1 : L}$ and scheduling ${ω_{k}}_{k = 1 : L}$ data, all of which are collected as the observed data set $D_{obs} = {y_{1 : L}, ω_{1 : L}, u_{1 : L}}$ in the EM algorithm framework. Moreover, there is a significant variable that cannot be observed, and therefore, it is the latent variable that obeys exponential distribution in the SAL distribution. In the EM algorithm framework, the expectation of the likelihood function respect to the latent variable is maximized. Thus, the unobservable data set $D_{mis} = {ϵ_{1 : L}$ } can be formed. Finally, the parameters set $Θ$ collects all the parameters to be identified, including the LPV–OE models parameters $θ$ and the parameters of noise $α, Σ$ , namely $Θ = {θ, α, Σ}$ .

E-Step: Firstly, the likelihood function of all data set including observed and unobserved variables $D_{obs}$ and $D_{mis}$ is calculated in the E-step.

\begin{matrix} p (D_{obs}, D_{mis} | Θ) = p (y_{1 : L}, u_{1 : L}, ω_{1 : L}, ϵ_{1 : L} | Θ) \\ = p (y_{1 : L} | u_{1 : L}, ω_{1 : L}, ϵ_{1 : L}, Θ) \times p (ϵ_{1 : L} | u_{1 : L}, ω_{1 : L}, Θ) \times C \end{matrix}

(19)

where $C = p (u_{1 : l,_1 : l | Θ})$ is a constant since both $u_{1 : L}$ and $ω_{1 : L}$ can be observed. Thus, the log-likelihood function of equation (19) can be deduced.

\begin{matrix} logp (D_{obs}, D_{mis} | Θ) \\ = logp (y_{1 : L} | u_{1 : L}, ω_{1 : L}, ϵ_{1 : L}, Θ) \\ + logp (ϵ_{1 : L} | u_{1 : L}, ω_{1 : L}, Θ) + logC \\ = \sum_{k = 1}^{L} logp (y_{k} | u_{u : k - 1}, ω_{k}, ϵ_{k}, Θ) \\ + \sum_{k = 1}^{L} logp (ϵ_{k} | Θ) + logC \end{matrix}

(20)

The Q-function in the EM algorithm, conditional expectation of the log-likelihood function with respect to the latent variables, can be represented by equation (21).

\begin{matrix} Q (Θ | Θ^{s}) \\ = E_{ϵ_{1 : L} | D_{obs}} {\sum_{k = 1}^{L} logp (y_{k} | u_{u : k - 1}, w_{k}, ϵ_{k}, Θ) + \sum_{k = 1}^{L} logp (ϵ_{k} | Θ) + C_{2}} \\ = \sum_{k = 1}^{L} \int p (ϵ_{k} | D_{obs}, Θ^{s}) \times logp (y_{k} | u_{1 : k - 1}, w_{k}, ϵ_{k}, Θ) d ϵ_{k} \\ + \sum_{k = 1}^{L} \int p (ϵ_{k} | D_{obs}, Θ^{s}) \times logp (ϵ_{k} | Θ) d ϵ_{k} + C_{3} \end{matrix}

(21)

In order to deduce the Q-function, the unknown quantities in equation (21) have to be studied, including

$p (ϵ_{k} | D_{obs}, Θ^{s})$ ;

$\int p (ϵ_{k} | D_{obs}, Θ^{s}) \times logp (y_{k} | u_{1 : k - 1}, w_{k}, ϵ_{k}, Θ) d ϵ_{k}$ ;

$\int p (ϵ_{k} | D_{obs}, Θ^{s}) \times logp (ϵ_{k} | Θ) d ϵ_{k}$ .

The posterior probability distribution of the latent variable $ϵ_{1 : L}$ can be deduced by using Bayes formula.

\begin{matrix} p (ϵ_{k} | y_{1 : L}, u_{1 : L}, w_{1 : L}, Θ^{s}) \\ = \frac{p (y_{k} | ϵ_{k}, u_{1 : k - 1}, w_{1 : k}, Θ^{s}) p (ϵ_{k} | Θ^{s})}{p (y_{k} | y_{1 : k - 1}, u_{1 : k - 1}, w_{1 : k}, Θ^{s})} \end{matrix}

(22)

It has been introduced in detail in Section that the SAL distribution can be reduced to Gaussian distribution when the latent variable $ϵ$ are observable. Thus, when the latent variable is assumed to be observed, the posterior probability distribution of the output data $y_{k}$ is Gaussian distribution that is represented by equation (23).

\begin{matrix} p (y_{k} | ϵ_{k}, u_{1 : k - 1}, w_{1 : k}, Θ^{s}) = \frac{1}{\sqrt{2 π ϵ_{k} Σ}} \exp \\ (- \frac{{(y_{k} - G^{s} (w_{k}, z^{- 1}) u_{k})}^{2}}{2 ϵ_{k} Σ}) \end{matrix}

(23)

When it comes to the condition that the latent variable cannot be observed, the output data obeys SAL distribution of which the probability density function is equation (10), and the random variable $ε_{k}$ obeys the exponential distribution with the rate of 1.

In this way, the unknown items of the posterior probability function of latent variable $ε$ are obtained and can be deduced in equation (24).

\begin{matrix} p (ϵ_{k} | y_{k - 1 : 1}, u_{k - 1 : 1}, w_{k}, Θ^{l}) = \frac{ϵ_{k}^{- \frac{1}{2}}}{2} (\frac{δ (y_{k}, G^{s} (w_{k}, z^{- 1}) u_{k} | Σ)}{2 + α^{T} Σ^{- 1} α})^{- \frac{1}{4}} \\ \times \frac{\exp {- \frac{1}{2 z_{k}} δ (y_{k}, G^{s} (w_{k}, z^{- 1}) u_{k} | Σ) - \frac{z_{k}}{2} (2 + α^{T} Σ^{- 1} α)}}{K_{\frac{1}{2}} (\sqrt{(2 + α^{T} Σ^{- 1} α) (δ (y_{k}, G^{s} (w_{k}, z^{- 1}) u_{k} | Σ))})} \end{matrix}

(24)

where $δ (y_{k}, G^{s} (w_{k}, z^{- 1}) u_{k} | Σ) = (y_{k} - G^{s} (w_{k}, z^{- 1}) u_{k})^{T} Σ^{- 1} (y_{k} - G^{s} (w_{k}, z^{- 1}) u_{k})$ . $K_{v}$ is the third kind of modified Bessel function with the exponent of v.

Comparing equation (24) with the generalized inverse Gaussian (GIG) distribution, it is clear that equation (24) is the GIG distribution with $a \equiv 2 + α^{'} Σ^{T} α$ , $b \equiv δ (y_{k}, G^{s} (w_{k}, z^{- 1}) u_{k} | Σ)$ . Herein, the properties of the GIG distribution are utilized to calculate the means of the variable $E (x)$ and the reciprocal of the variable $E (1 / x)$ .

E [z_{k}] = \sqrt{\frac{b}{a}} R_{\frac{1}{2}} (\sqrt{ab})

(25)

E [\frac{1}{z_{k}}] = \sqrt{\frac{a}{b}} R_{t} (\sqrt{ab}) - \frac{1}{b}

(26)

where $R_{1 / 2} (x) : = K_{3 / 2} (x) / K_{1 / 2} (x)$ , $E [z_{k}] \overset{Δ}{=} E_{1}$ , $E [1 / z_{k}] \overset{Δ}{=} E_{2}$ .

Since all the unknown items in equation (21) have been deduced, by substituting equations (25) and (26) into (21), we can deduce the Q-function. For the sake of presentation, the Q-function is divided into the sum of three terms. In addition, in the M-step in the EM algorithm, the Q-function will be maximized by finding suitable parameters $Θ$ , and therefore the constant term in the Q-function that does not matter is neglected. In this way, the Q-function is shown as the sum of two terms, $Q (Θ | Θ^{S}) = Q_{1} (Θ | Θ^{S}) + Q_{2} (Θ | Θ^{S})$ .

\begin{matrix} Q_{1} (Θ | Θ^{S}) = \sum_{k = 1}^{L} \int p (ϵ_{k} | D_{obs}, Θ^{s}) \times logp (y_{k} | u_{1 : k - 1}, w_{k}, ϵ_{k}, Θ) d ϵ_{k} \\ = \sum_{k = 1}^{L} \int p (ϵ_{k} | D_{obs}, Θ^{s}) \\ \times {\log \frac{1}{\sqrt{2 π} \sqrt{ϵ_{k} Σ}} \exp (- \frac{{(y_{k} - G^{s} (w_{k}, z^{- 1}) u_{k} - ϵ_{k} α)}^{2}}{2 ϵ_{k} Σ})} \\ = \sum_{k = 1}^{L} - \frac{1}{2} \log (2 π) - \frac{1}{2} \log (Σ) - \frac{1}{2} E (\log (ϵ_{k})) - \\ E_{2} \frac{1}{2 Σ} {{(y_{k} - G^{s} (w_{k}, z^{- 1}) u_{k} - ϵ_{k} α)}^{2}} \end{matrix}

(27)

\begin{matrix} Q_{2} (Θ | Θ^{S}) = \sum_{k = 1}^{L} \int p (ϵ_{k} | D_{obs}, Θ^{s}) \times logp (ϵ_{k} | Θ) d ϵ_{k} \\ = \sum_{k = 1}^{L} \int p (ϵ_{k} | D_{obs}, Θ^{s}) \times \log (\exp (- ϵ_{k})) \\ = \sum_{k = 1}^{L} - E_{1} \end{matrix}

(28)

M-Step: Deducing the iteration formulas of the parameters $Θ$ to maximize the log likelihood function (Q-function).

In order to deduce the skewness $α$ , as there is no connection between the second item of the Q-function $Q_{2} (Θ | Θ^{S})$ and the skewness $α$ , taking the derivative of $α$ with respect to $Q_{1} (Θ | Θ^{S})$ and setting it equal to zero can get the iteration formula of the skwness $α$ , which is shown in equation (29).

α^{s + 1} = \frac{1}{L} \frac{\sum_{k = 1}^{L} (y_{k} - G^{s} (w_{k}, z^{- 1}) u_{k})}{\sum_{k = 1}^{L} α^{s}}

(29)

When it comes to the iteration formula of the parameters of LPV models $θ$ , the Newton–Raphson method is utilized to search for the suitable $θ$ that maximizes the first item of the Q-function $Q_{1} (Θ | Θ^{S})$ since there is also no connection between $Q_{2} (Θ | Θ^{S})$ and $θ$ .

θ^{s + 1} = θ^{s} - {(\frac{d^{2} Q (θ)}{d θ^{2}} + μ I)}^{- 1} \frac{dQ (θ)}{d θ} |_{θ = θ^{s}}

(30)

where the first-order derivative of $Q_{1} (Θ | Θ^{S})$ is deduced in equation (31), and the second-order derivative of it is deduced in equation (32).

\frac{dQ (θ)}{d θ} = \frac{2}{L} {\sum_{k = 1}^{L} E_{2} (y_{k} - {\hat{ϕ}}_{k}^{T} θ - E_{1} α) {\hat{ϕ}}_{k}}

(31)

\frac{d^{2} Q (θ)}{d θ^{2}} = \frac{2}{L} {\sum_{k = 1}^{L} E_{2} {\hat{ϕ}}_{k}^{T} {\hat{ϕ}}_{k}}

(32)

where ${\hat{ϕ}}_{k}$ is the estimation of $ϕ_{k}$ .

{\hat{x}}_{k} = G^{s} (w_{k}, z^{- 1}) u_{k}

(33)

The complete LPV–OE models’ parameters estimation algorithm is summarized as in Algorithm 1.

Algorithm 1: LPV Models’ Parameters Estimation Algorithm
Input: Observed data set $D_{obs} = {y_{1 : L}, u_{1 : L}, w_{1 : L}}$
Output: Parameters to be estimated $Θ = {θ, α, Σ}$
1 Initializing the number of iterations $l = 1$ ; Setting the target residuals E and maximum number of iterations L;
2 Initializing the parameters to be estimated $θ^{l}, α^{l}$ ;
3 forl = from 1 to $Res < E$ or $l > L$ do
4 E-Step:;
5 Calculating the posterior probability density of $z_{k}$ based on Eq.(24);
6 Calculating the means of latent variables $1 / z_{i}$ and $z_{i}$ , $E_{1}$ and $E_{2}$ based on Eq.(25) and Eq.(26) respectively;
7 M-Step:;
8 Calculating the skewness $α^{l}$ based on Eq.(29);
9 Calculating the parameters $θ^{l}$ of LPV models based on skewness $α^{l}$ and Eq.(30,31,32);
10 Calculating the noise-free output $\hat{{} x_{k}}_{k = 1 : L}$ based on parameters $θ^{l}$ and skewness $α^{l}$ ;
11 Calculating the residual $Res = y - \hat{x}$ ;
12 end
13 Return $θ$ , $α$ ;

Simulation example

Considering the following LPV–OE model

\begin{matrix} C (v_{k}, d^{- 1}) z_{k} = E (v_{k}, d^{- 1}) u_{k} \\ y_{k} = z_{k} + Δ_{k} \end{matrix}

(34)

where

\begin{matrix} C (v_{k}, d^{- 1}) = 1 + c_{1} (v_{k}) d^{- 1} + c_{2} (v_{k}) d^{- 2} \\ c_{1} (v_{k}) = 1 - 0.5 v_{k} - 0.1 v_{k}^{2} \\ c_{2} (v_{k}) = 0.5 - 0.7 v_{k} - 0.1 v_{k}^{2} \end{matrix}

(35)

and

\begin{matrix} E (v_{k}, d^{- 1}) = e_{1} (v_{k}) d^{- 1} + e_{2} (v_{k}) d^{- 2} \\ e_{1} (v_{k}) = 0.5 - 0.4 v_{k} + 0.01 v_{k}^{2} \\ e_{2} (v_{k}) = 0.5 - 0.3 v_{k} - 0.02 v_{k}^{2} \end{matrix}

(36)

where $z_{k}$ is the noise-free output. $Δ$ is the noise. As the robustness to asymmetric noise is emphasized in this paper, the noise is assumed to obey the asymmetric Laplace distribution comparing with the symmetric distribution, for example, Gaussian distribution (white noise) and Laplace distribution. $v_{k}$ in equation (34) is the scheduling variable in LPV models. In addition, the parameters $c_{1}, c_{2}, e_{1}, e_{2}$ vary with the scheduling variable, and the relationships between the time-varying parameters and the scheduling variable is defined in equations (35) and (36).

In the simulation example, Gaussian white noise is the input signal which is a common and effective input signal when the hardware conditions permit, and it can be utilized to fully simulate the process and obtain the data needed for identification. At the same time, in the single-input single-output system, the input signal is also the scheduling variable immediate $u_{k} = v_{k}$ .

Moreover, outliers are common in the practice. Since the SAL distribution is a special type of Laplace distribution, the proposed algorithm is also robust to the outliers. Thus, the output data is also polluted by outliers in order to verify the robustness to outliers.

The training data are shown in Figure 2 where 10% of the output data are polluted by outliers that are 10 times bigger than the noise-free output. By using the polluted data and the Algorithm1, the parameters of the LPV–OE model can be identified, and the comparison of the estimated parameters with the ground truth parameters are shown in Figure 3. The convergence procedures of the parameters are also illustrated in it. The value of the parameters at each iteration are shown in blue, and the red lines are the ground truth parameters. In the experiment, with the initialization parameters (2), all the parameters converge to the ground truth values and remain stable after 100 iterations. Thus, the proposed algorithm has local convergence, and it can identify the parameters of LPV–OE models accurately when the training data is polluted by asymmetric noise and outliers.

Remark 2. The robustness to the outliers and asymmetric noise comes from equation (31). In terms of outliers, compared with the algorithm (equation (37)) based on Gaussian distribution,²⁰ there is the weight coefficient $E_{2}$ before the residual $y_{k} - {\hat{ϕ}}_{k}^{T} θ$ in equation (31) that can identify the outliers in the training data. Almost by definition equation (26), when the estimation parameters closer to the ground truth parameters after several iteration, the weight coefficients decrease only if the deviation between estimation output and collected output is large, and a larger deviation means that the sampling point has a greater likelihood of becoming outliers. Thus, the sampling points with larger deviation are given small weight coefficient to weaken the influence of outliers.

Figure 2.

The input data (scheduling variable) and output data.

Figure 3.

The convergence procedures.

As for the robustness to the asymmetric noise, the item $E_{1} α$ in equation (31) represents the identified skewness, and therefore eliminates the effects of asymmetry. Of course, $α$ equals to zeros when the data is only polluted by symmetric noise.

\frac{dQ (θ)}{d θ} = \frac{2}{L} {\sum_{k = 1}^{L} (y_{k} - {\hat{ϕ}}_{k}^{T} θ) {\hat{ϕ}}_{k}}

(37)

Most of the robust LPV models identification methods focus on the robustness to outliers but various noise. Specifically, the parameters estimation algorithms are deduced based on Laplace distribution¹⁷ and Student’s T distribution¹⁵ since these two distributions are heavy-tailed distributions. However, they can only model the symmetric noise. If the training data is polluted by asymmetric noise, the estimated parameters are bias. Fortunately, the proposed algorithm is robust to both symmetric/asymmetric noise with outliers.

The numerical example is also employed to verify the robustness to asymmetric noise. In the meanwhile, two parameters estimation algorithms proposed in Yang and Yang²¹ and Liu et al.²² are employed when the training data is polluted by asymmetric noise with outliers. The first algorithm is deduced based on Gaussian distribution. Thus, the algorithm is not robust. In the second algorithm, Laplace distribution is employed to model the noise. So, the algorithm is robust to the outliers. However, these two algorithms are not robust to asymmetric noise. The parameters identified by the three algorithms as well as the ground truth parameters are shown in Table 1.

Table 1.

The comparison of estimated parameters with true parameters.

	$c_{1}$
	$c_{1, 1}$	$c_{1, 2}$	$c_{1, 3}$
Ground truth	1	−0.5	−0.1
The algorithm based on SAL distribution	0.9983	−0.4901	−0.1066
The algorithm based on Gaussian distribution	1.0284	−0.5134	−0.1389
The algorithm based on Laplace distribution	0.9737	−0.4875	−0.0959
	$e_{1}$ $e_{2, 1}$
	$e_{1, 1}$	$e_{1, 2}$	$e_{1, 3}$
Ground truth	0.5	−0.4	0.01
The algorithm based on SAL distribution	0.4880	−0.4227	0.0245
The algorithm based on Gaussian distribution	0.3960	−0.5924	0.1245
The algorithm based on Laplace distribution	0.5496	−0.3937	−0.0849
	$c_{2}$
	$c_{2, 1}$	$c_{2, 2}$	$c_{2, 3}$
Ground truth	0.5	−0.7	−0.1
The algorithm based on SAL distribution	0.4959	−0.6907	−0.0991
The algorithm based on Gaussian distribution	0.4761	−0.7678	−0.1344
The algorithm based on Laplace distribution	0.4756	−0.7131	−0.0849
	$e_{2}$
	$e_{2, 1}$	$e_{2, 2}$	$e_{2, 3}$
Ground truth	0.5	−0.3	0.02
The algorithm based on SAL distribution	0.4879	−0.1854	−0.0279
The algorithm based on Gaussian distribution	0.4563	−0.3413	−0.0493
The algorithm based on Laplace distribution	0.4786	−0.1575	−0.0496

From the results, it can be seen that the performance of the proposed algorithm is better than the other two algorithms. The robustness of the algorithm based on Gaussian is the worst because Gaussian distribution a symmetric distribution and it cannot model outliers and asymmetric noise in the data. The algorithm based on Laplace distribution performs better than the counterpart based on Gaussian distribution, since Laplace distribution is a heavy-tailed distribution. Its heavy tail can model the outliers in the training data. However, it is still a symmetric distribution, so it is not robust to the asymmetric noise. The histogram of the noise in the numerical example is shown in Figure 4. The vast majority of noise are greater than zero. The asymmetry of this noise cannot be modeled by Laplace distribution but it is common in industries. Fortunately, the CAL distribution can model the noise and therefore, the proposed algorithm has stronger robustness and identifies the parameters accurately.

Figure 4.

The histogram of the noise.

Conclusion

In the work, the parameters estimation algorithm of LPV–OE models is deduced based on asymmetric Laplace distribution and the EM algorithm. The proposed algorithm is robust to both asymmetric noise and outliers, and the effectiveness of the proposed algorithm is verified by a numerical example.

The advantages of the proposed algorithm come from the asymmetric Laplace distribution. Compared with the Gaussian and the Laplace distribution, the asymmetric Laplace distribution has heavy tail and asymmetry respectively. Thus, it can model more general noise, both the asymmetric noise and symmetric noise with outliers. To make it is possible to utilized asymmetric Laplace distribution, it is represented as the product of exponential distribution and Gaussian distribution. When the random variable that obeys exponential distribution is selected as the latent variable in the EM algorithm framework. The posterior distribution of the latent variable can be deduced and be represented as the structure of GIG distribution. In this way, the parameters estimation algorithm can be deduced. Finally, by using a numerical example, it is verified that the proposed algorithm can identify the parameters of LPV–OE models when the training data is polluted by asymmetric noise and outliers.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Joint Funds of the National Natural Science Foundation of China under Grant U20A20188.

ORCID iDs

Chao Xu

Xianqiang Yang

References

Zhu

Zheng

WX.

Multiple lyapunov functions analysis approach for discrete-time-switched piecewise-affine systems under dwell-time constraints. IEEE Trans Automat Contr 2019; 65: 2177–2184.

Daryabak

Filizadeh

Jatskevich

, et al. Modeling of LCC-HVDC systems using dynamic phasors. IEEE Trans Power Deliv 2014; 29: 1989–1998.

Saad

Peralta

Dennetire

, et al. Dynamic averaged and simplified models for MMC-based HVDC transmission systems. IEEE Trans Power Deliv 2013; 28: 1723–1730.

Zhu

Zheng

Zhou

Quasi-synchronization of discrete-time lur e-type switched systems with parameter mismatches and relaxed PDT constraints. IEEE Trans Cybern 2019; 50: 2026–2037.

Shao

Immersion and invariance adaptive pose control for spacecraft proximity operations under kinematic and dynamic constraints. IEEE Trans Aerosp Electron Syst. Epub ahead of print 20 January 2021. DOI: 10.1109/taes.2021.3053134.

Cerone

Piga

Regruto

, et al. Minimal LPV state-space realization driven set-membership identification. In: 2012 American control conference (ACC), Montreal, QC, Canada, 27–29 June 2012, pp.3421–3426. New York: IEEE.

Zhu

Zheng

WX.

Observer-based control for cyber-physical systems with periodic dos attacks via a cyclic switching strategy. IEEE Trans Automat Contr 2019; 65: 3714–3721.

Deng

Huang

Identification of nonlinear parameter varying systems with missing output data. AIChE J 2012; 58: 3454–3467.

Yang

Liu

Multimodel approach to robust identification of multiple-input single-output nonlinear time-delay systems. IEEE Trans Industr Inform 2019; 16: 2413–2422.

10.

Liu

Yang

Robust variational inference for LPV dual-rate systems with randomly delayed outputs. IEEE Trans Instrum Meas 2021; 70: 1–9.

11.

Gebraad

PMO

van Wingerden

van der Veen

, et al. LPV subspace identification using a novel nuclear norm regularization method. In: Proceedings of the 2011 American control conference, San Francisco, CA, 29 June–1 July 2011, pp.165–170. New York: IEEE.

12.

Shao

Shi

, et al. Data-driven immersion and invariance adaptive attitude control for rigid bodies with double-level state constraints. IEEE Trans Control Syst Technol. Epub ahead of print 12 May 2021. DOI: 10.1109/tcst.2021.3076439.

13.

Yang

Liu

Yin

Robust identification of nonlinear systems with missing observations: the case of state-space model structure. IEEE Trans Industr Inform 2018; 15: 2763–2774.

14.

Liu

Han

Yang

Robust global identification of LPV errors-in-variables systems with incomplete observations. IEEE Trans Syst Man Cybern Syst. Epub ahead of print 7 May 2021. DOI: 10.1109/tsmc.2021.3071137.

15.

Yang

Yin

Kaynak

Robust identification of LPV time-delay system with randomly missing measurements. IEEE Trans Syst Man Cybern Syst 2017; 48: 2198–2208.

16.

Hamdi

Amairi

Aoun

. Orthotopic approach of set-membership parameters estimation for LPV system using fractional models. In: 2017 18th international conference on sciences and techniques of automatic control and computer engineering (STA), Monastir, Tunisia, 21–23 December 2017, pp.261–266. New York: IEEE.

17.

Yang

Yin

Robust global identification and output estimation for LPV dual-rate systems subjected to random output time-delays. IEEE Trans Industr Inform 2017; 13: 2876–2885.

18.

Yang

Yan

Robust global identification of linear parameter varying systems with generalised expectation–maximisation algorithm. IET Control Theory Appl 2015; 9: 1103–1110.

19.

Kotz

Kozubowski

Podgorski

The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. New York, NY: Springer Science & Business Media, 2012.

20.

Yang

Xiong

Huang

, et al. Identification of linear parameter varying systems with missing output data using generalized expectation-maximization algorithm. IFAC Proc Vol 2014; 47: 9364–9369.

21.

Yang

Local identification of LPV dual-rate system with random measurement delays. IEEE Trans Ind Electron 2017; 65: 1499–1507.

22.

Liu

Yang

Zhu

, et al. Robust identification of nonlinear time-delay system in state-space form. J Franklin Inst 2019; 356: 9953–9971.

Robust LPV models identification approach based on shifted asymmetric Laplace distribution

Abstract

Keywords

Introduction

Problem formulation

The LPV–OE model

Asymmetric Laplace distribution

Robust parameters estimation algorithm for LPV–OE systems

Brief introduction of the EM algorithm

Mathematical formulation of the robust parameters estimation approach

Simulation example

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References