Effective fault detection in structural health monitoring systems

Abstract

A new fault detection technique is considered in this article. It is based on kernel partial least squares, exponentially weighted moving average, and generalized likelihood ratio test. The developed approach aims to improve monitoring the structural systems. It consists of computing an optimal statistic that merges the current information and the previous one and gives more weight to the most recent information. To improve the performances of the developed kernel partial least squares model even further, multiscale representation of data will be used to develop a multiscale extension of this method. Multiscale representation is a powerful data analysis way that presents efficient separation of deterministic characteristics from random noise. Thus, multiscale kernel partial least squares method that combines the advantages of the kernel partial least squares method with those of multiscale representation will be developed to enhance the structural modeling performance. The effectiveness of the proposed approach is assessed using two examples: synthetic data and benchmark structure. The simulation study proves the efficiency of the developed technique over the classical detection approaches in terms of false alarm rate, missed detection rate, and detection speed.

Keywords

Fault detection multiscale kernel partial least squares generalized likelihood ratio test exponentially weighted moving average structural health monitoring

Introduction

Structural health monitoring (SHM) is considered as an approach to monitor the structure systems. It consists of developing monitoring solutions based on the data collected by the sensors installed on the structure. Its purpose is to detect, identify, estimate, and/or locate the existing faults on the structure and to make decisions concerning its evolution and its state in order to improve its safety. In the current work, we will focus on the fault detection (FD) problem.

In the literature, many methods for detecting faults^1–3 have been considered. They can be defined as methods based on a model^4–6 or as data-driven methods.^7–12 Many physical systems of the actual civil structure are complex and difficult to construct their models.^13,14 Therefore, data-driven methods are widely used for monitoring and FD purposes.¹⁵

Latent variable regression (LVR) is a well-known data-driven modeling technique that includes principal component analysis (PCA)¹⁶ and partial least squares (PLS).¹⁷ LVR modeling methods are a multivariate analysis that aims to reduce the dimensionality of the data and rely on the definition of a linear data transformation via an orthonormal matrix which is computed from the dataset itself.

However, most of the practical systems are input–output, nonlinear, and multivariate. To make the extension to input–output, nonlinear, and multivariate models, kernel partial least squares (KPLS) will be used for the modeling purposes. The benefits of using the KPLS model lie in the fact that it does not require nonlinear optimization and it only requires the solution of an eigenvalue problem.

To improve the performances of the KPLS model even further, multiscale representation of data will be used to develop a multiscale kernel partial least squares (MSKPLS) method. Multiscale representation is a powerful data analysis way that presents efficient separation of deterministic characteristics from random noise.

Once the model is built, the monitoring technique should be addressed in order to detect the faults in SHM systems. To do that, a novel detection chart based on generalized likelihood ratio test (GLRT) and exponentially weighted moving average (EWMA) chart statistics will be applied. The EWMA-GLRT chart depends upon evaluating the residuals, which are obtained from the MSKPLS to detect the fault in the network.

The benefits of the proposed EWMA-GLRT are to develop a new detection statistic that considers the current information and the previous one and strengthens the more recent information by providing more weight to it. Therefore, the developed FD technique consists of considering the MSKPLS for modeling purposes and EWMA-GLRT chart for detection and monitoring goals. The developed technique will be evaluated and compared to many other techniques using two examples: synthetic data and benchmark structure. The results show the efficiency of the proposed technique over the classical approaches in terms of false alarm rate (FAR), missed detection rate (MDR), and detection speed.

The rest of this article is arranged as follows: first, we describe the multiscale KPLS method in section “Multiscale KPLS description.” In section “OEWMA-GLRT detection chart description,” we present the new EWMA-GLRT chart. In section “Applications,” the effectiveness of the proposed approach is evaluated through synthetic data and a simulated SHM benchmarking data. Finally, we present the conclusions in section “Conclusion.”

Multiscale KPLS description

KPLS

Given an input matrix $X \in R^{N \times m}$ consists of N samples with m process variables and an output matrix $Y \in R^{N \times m}$ consists of N observations with p product quality variables. The PLS technique projects the available process measurements X into a low-dimensional space with $ℓ$ latent variables, where $ℓ$ can be determined by certain criteria, such as the cumulative percent variance (CPV) criterion. Then, Y can be constructed by these latent variables. The PLS model can be expressed¹⁸ as follows

\begin{array}{l} X = T P^{T} + E = \sum_{i = 1}^{ℓ} t_{i} p_{i}^{T} + E \\ Y = U Q^{T} + F = \sum_{j = 1}^{ℓ} u_{j} q_{j}^{T} + F \end{array}

(1)

where $T = [\begin{matrix} t_{1} & t_{2} \dots t_{ℓ} \end{matrix}] \in R^{N \times ℓ}$ and $U = [\begin{matrix} u_{1} & u_{2} \dots u_{ℓ} \end{matrix}] \in R^{N \times ℓ}$ are the input and output score matrices, respectively. $P = [\begin{matrix} p_{1} & p_{2} \dots p_{ℓ} \end{matrix}] \in R^{m \times ℓ}$ and $Q = [\begin{matrix} q_{1} & q_{2} \dots q_{ℓ} \end{matrix}] \in R^{N \times ℓ}$ are the input and the output loading matrices, respectively. $E \in R^{N \times m}$ and $F \in R^{N \times p}$ denote the residual parts of X and Y, respectively, which are obtained from the PLS model.

Note that the matrices X and Y are first standardized to have zero mean and unit variance before constructing the PLS model. PLS assumes linear correlation among variables, which lead to prediction and modeling errors in cases of nonlinear processes. To address this issue and in order to extend this technique to deal with nonlinear input–output models, many extensions have been proposed to define the nonlinearities. The PLS has been extended to nonlinear regression through the use of kernel-based functions. The most well-known approach is called KPLS, which will be considered in the next section.

KPLS model

The KPLS method consists of using the classical PLS directly in the feature space. Its idea first is to map the original nonlinear data into a linear high-dimensional feature space and then to perform linear PLS in the feature space. The main idea of KPLS method is illustrated in Figure 1.

Figure 1.

Basic idea of KPLS.

Define $ϕ$ as a nonlinear map which projects the input vector from the original space into the feature space $F$ in which they are related linearly approximately. After the nonlinear map, the input matrix X turns to the feature matrix $Φ$

Φ = {[\begin{matrix} ϕ (x_{1}) & ϕ (x_{2}) \dots ϕ (x_{N}) \end{matrix}]}^{T} \in R^{N \times f}

(2)

where the dimension f can be arbitrarily large or even infinite.

In the feature space, $ϕ (x_{i})$ $(i = 1, 2, . . ., N)$ should be scaled to zero mean

\bar{ϕ} (x_{i}) = ϕ (x_{i}) - \bar{ϕ}

(3)

where

\begin{matrix} \bar{ϕ} = \frac{1}{N} \sum_{i = 1}^{N} ϕ (x_{i}) \\ = \frac{1}{N} [\begin{matrix} ϕ (x (1)) & ϕ (x (2)) \dots ϕ (x (N)) \end{matrix}] I_{N} \\ = \frac{1}{N} Φ^{T} I_{N} \end{matrix}

(4)

with $I_{N} = [1 \dots 1]^{T} \in R^{N}$ .

Thus, the zero mean of $Φ$ is

\begin{matrix} \bar{Φ} = {[\begin{matrix} \bar{ϕ} (x_{1}) & \bar{ϕ} (x_{2}) \dots \bar{ϕ} (x_{N}) \end{matrix}]}^{T} \\ = Φ - {[\begin{matrix} \bar{ϕ} & \bar{ϕ} \dots \bar{ϕ} \end{matrix}]}^{T} \\ = Φ - \frac{1}{N} I_{N} I_{N}^{T} Φ \end{matrix}

(5)

According to Sheriff et al.,¹⁹ the KPLS model of $(\bar{Φ}, Y)$ is modeled as follows

\begin{matrix} \bar{Φ} = T P^{T} + {\bar{Φ}}_{r} \\ Y = U Q^{T} + Y_{r} \end{matrix}

(6)

where $T \in R^{N \times A}$ and $U \in R^{N \times A}$ are the input and output score matrices, respectively. $P \in R^{m \times A}$ and $Q \in R^{N \times A}$ are the loading matrices of $\bar{Φ}$ and Y, respectively. ${\bar{Φ}}_{r}$ and $Y_{r}$ are residuals. A is the number of latent variables in the feature space determined by the CPV criterion.¹⁹

Note that $ϕ (.)$ is not explicitly defined and the dimension f of $Φ$ is arbitrarily large or even infinite. To avoid the explicit using $Φ$ , a kernel matrix K is introduced and it is defined as

K = Φ Φ^{T}

(7)

where the entry $k_{ij}$ of the matrix K is given by

k_{ij} = ϕ^{T} (x_{i}) ϕ (x_{j}) = k (x_{i}, x_{j})

(8)

Thus, the kernel matrix K is given by

\begin{matrix} K = Φ Φ^{T} \\ = [\begin{matrix} ϕ^{T} (x_{1}) ϕ (x_{1}) & \dots & ϕ^{T} (x_{1}) ϕ (x_{N}) \\ ⋮ & ⋱ & ⋮ \\ ϕ^{T} (x_{N}) ϕ (x_{1}) & \dots & ϕ^{T} (x_{N}) ϕ (x_{N}) \end{matrix}] \\ = [\begin{matrix} k (x_{1}, x_{1}) & \dots & k (x_{1}, x_{N}) \\ ⋮ & ⋱ & ⋮ \\ k (x_{N}, x_{1}) & \dots & k (x_{N}, x_{N}) \end{matrix}] \end{matrix}

(9)

The function $k (., .)$ is called the kernel function. Note that there exist several types of these functions.^20,21 The Gaussian kernel function is considered as one of the most used functions which is defined as

k (x_{i}, x_{j}) = \exp (- \frac{{(x_{i} -_{j})}^{T} (x_{i} - x_{j})}{c})

(10)

where c is the width of a Gaussian function.

The kernel matrix K is centralized as

\bar{K} = \bar{Φ} {\bar{Φ}}^{T}

(11)

According to equation (5), $\bar{K}$ can be written as

\begin{matrix} \bar{K} = [Φ - \frac{1}{N} I_{N} I_{N}^{T} Φ] {[Φ - \frac{1}{N} I_{N} I_{N}^{T} Φ]}^{T} \\ = [E - \frac{1}{N} I_{N} I_{N}^{T}] Φ Φ^{T} {[E - \frac{1}{N} I_{N} I_{N}^{T}]}^{T} \\ = [E - \frac{1}{N} I_{N} I_{N}^{T}] K {[E - \frac{1}{N} I_{N} I_{N}^{T}]}^{T} \end{matrix}

(12)

where I is the identity matrix.

From the centered data matrices K and Y, the regression coefficient matrix B can be obtained as^22,23

B = Φ^{T} U (T^{T} \bar{K} U)^{- 1} T^{T} Y

(13)

The prediction of the response variables is given by

\hat{Y} = Φ B = KU (T^{T} \bar{K} U)^{- 1} T^{T} Y

(14)

Equation (14) shows that the response variables (outputs) can be obtained from the inner products of the mapped vectors. Hence, for a new observation x of the predictor vector, the outputs are estimated by

\begin{matrix} \hat{y} = B^{T} \bar{ϕ} (x) = Y^{T} T {[U {(T^{T} \bar{K} U)}^{- 1}]}^{T} \bar{Φ} \bar{ϕ} (x) \\ = Y^{T} T {[U {(T^{T} \bar{K} U)}^{- 1}]}^{T} \bar{k} (x) \end{matrix}

(15)

where $\bar{k} (x)$ is the vector of centered kernel functions. It is defined as

\begin{matrix} \bar{k} (x) = \bar{Φ} \bar{ϕ} (x) \\ = {[\begin{matrix} \bar{ϕ} (x_{1}) & . & . & . & \bar{ϕ} (x_{N}) \end{matrix}]}^{T} \bar{ϕ} (x) \\ = {[\begin{matrix} {\bar{ϕ}}^{T} (x_{1}) \bar{ϕ} (x) & . & . & . & {\bar{ϕ}}^{T} (x_{N}) \bar{ϕ} (x) \end{matrix}]}^{T} \\ = {[\begin{matrix} \bar{k} (x_{1}, x) & . & . & . & \bar{k} (x_{N}, x) \end{matrix}]}^{T} \end{matrix}

(16)

From equations (4), (5), and (16), the following relationships can be defined as

\begin{matrix} \bar{k} (x) = \bar{Φ} \bar{ϕ} (x) \\ = [Φ - \frac{1}{N} I_{N} I_{N}^{T} Φ] [ϕ (x) - \frac{1}{N} Φ^{T} I_{N}] \\ = [I_{N} - \frac{1}{N} I_{N} I_{N}^{T}] Φ [ϕ (x) - \frac{1}{N} Φ^{T} I_{N}] \\ = [I_{N} - \frac{1}{N} I_{N} I_{N}^{T}] [k (x) - \frac{1}{N} K I_{N}] \end{matrix}

(17)

where $k (x)$ is the vector of non-centered kernel functions and is defined analogously to $\bar{k} (x)$ . The main steps of KPLS modeling are illustrated in Algorithm 1.

Algorithm 1. KPLS algorithm.
1. Set $i = 1$ , $K_{1} = K$ , and $Y_{1} = Y$ . 2. Initialize the score-vector $u_{i} (N \times 1)$ of the latent variable $u_{i}$ of $Y_{i}$ , as the maximum-variance column of $Y_{i}$ . 3. Compute the score-vector $t_{i} (N \times 1)$ of the latent variable $t_{i}$ of $Φ_{i}$ , as $t_{i} = K_{i} u_{i} / \| \| K_{i} u_{i} \| \|$ , $\| \| t_{i} \| \| = 1$ . 4. Regress the columns of $Y_{i}$ on $t_{i}$ : $c_{i} = Y_{i} t_{i}$ , where $c_{i}$ is a weighting vector. 5. Calculate the new score-vector: $u_{i} = Y_{i} c_{i} / \| \| Y_{i} c_{i} \| \|$ , $\| \| u_{i} \| \| = 1$ . 6. Repeat steps (3) to (5) until the convergence of $t_{i}$ . 7. Deflate the matrices: $K_{i + 1} = (I - t_{i} t_{i}^{T}) K_{i} (I - t_{i} t_{i}^{T})$ , $Y_{i + 1} = Y_{i} - t_{i} t_{i}^{T} Y_{i}$ . 8. Save the data in the matrices: $T \leftarrow t_{i}$ , $U \leftarrow u_{i}$ . 9. Set $i = i + 1$ , and return to step (2). Stop when $i > ℓ$ , with $ℓ$ being the selected number of latent variables.

Algorithm 1. KPLS algorithm.

1. Set

i = 1

K_{1} = K

, and

Y_{1} = Y

.
2. Initialize the score-vector

u_{i} (N \times 1)

of the latent variable

u_{i}

Y_{i}

, as the maximum-variance column of

Y_{i}

.
3. Compute the score-vector

t_{i} (N \times 1)

of the latent variable

t_{i}

Φ_{i}

, as

t_{i} = K_{i} u_{i} / | | K_{i} u_{i} | |

| | t_{i} | | = 1

.
4. Regress the columns of

Y_{i}

t_{i}

c_{i} = Y_{i} t_{i}

, where

c_{i}

is a weighting vector.
5. Calculate the new score-vector:

u_{i} = Y_{i} c_{i} / | | Y_{i} c_{i} | |

| | u_{i} | | = 1

.
6. Repeat steps (3) to (5) until the convergence of

t_{i}

.
7. Deflate the matrices:

K_{i + 1} = (I - t_{i} t_{i}^{T}) K_{i} (I - t_{i} t_{i}^{T})

Y_{i + 1} = Y_{i} - t_{i} t_{i}^{T} Y_{i}

.
8. Save the data in the matrices:

T \leftarrow t_{i}

U \leftarrow u_{i}

.
9. Set

i = i + 1

, and return to step (2). Stop when

i > ℓ

, with

ℓ

being the selected number of latent variables.

Multiscale KPLS technique

Teppola and Minkkinen²⁴ are the first who have developed the linear multiscale PLS, where the multiscale representation was combined with linear PLS model. It has been applied essentially to remove the drift in the data set. Zhang and Hu²⁵ have studied the nonlinear extension (MKPLS). The wavelet-based multiscale representation of data can ameliorate the KPLS modeling. Similar to the proposed method multiscale principal component analysis (MSPCA) by Bakshi and Top,²⁶ we shall use in this work the multiscale KPLS algorithm. Similar to PLS, the modeling using KPLS can be proceeded in the mapped feature space.

Given a data set of training data, the variables are decomposed in the feature space through the discrete wavelet transform (DWT). At each individual scale, KPLS model is applied. In order to construct the data, important scale coefficients are selected according to a statistical threshold (see Figure 2). Then, KPLS model is applied on global scale for FD. At each scale, the statistical thresholding is proceeded as a data filtering stage which improves KPLS efficiency. The MSKPLS algorithm is shown in Algorithm 2 and its representation model is shown in Figure 2.

Figure 2.

Representation of MSKPLS model for FD.

Algorithm 2. MSKPLS algorithm.
The input and output data X and Y are each decomposed into coarse approximate scale and detail scales At each scale, KPLS algorithm is applied and the loading vectors and score vectors of X and Y matrices are computed For X and Y and at each scale, only important coefficients (higher then threshold values) are selected X and Y are reconstructed KPLS algorithm is performed on the new X and Y matrices.

Algorithm 2. MSKPLS algorithm.

The input and output data X and Y are each decomposed into coarse approximate scale and detail scales
At each scale, KPLS algorithm is applied and the loading vectors and score vectors of X and Y matrices are computed
For X and Y and at each scale, only important coefficients (higher then threshold values) are selected
X and Y are reconstructed
KPLS algorithm is performed on the new X and Y matrices.

Advantages of multiscale data representation

Multiscale representation has the ability to separate the noise from important features in the data. When data are decomposed at multiple scales by passing through low-pass and high-pass filters, noise is effectively separated from the important features. Random noise in a signal is normally present over all the coefficients, whereas deterministic features in the data are captured in a few but relatively large coefficients. The important features in the data are usually captured by the last scaled signals as well as any large wavelet coefficient (in the detail signals), whereas other small wavelet coefficients usually correspond to noise.²⁶ Thus, multiscale representation provides an effective method for noise-feature separation as shown in Figure 3.

Figure 3.

A schematic diagram of data representation at multiple scales: (a) original signal, (b) first scaled signal, (c) first detail signal, (d) second scaled signal, (e) second detail signal, (f) third scaled signal, and (g) third detail signal.²⁶

The benefits of multiscale representation present that they can help verify the assumptions of independence, normality, and noise level made by several univariate and multivariate monitoring approaches.²⁷

The objective of the next section is to present MSKPLS-based optimized exponentially weighted moving average-generalized likelihood ratio test (OEWMA-GLRT) approach for FD purposes. The developed framework is addressed so that the modeling phase is addressed using the MSKPLS and the detection of the faults is achieved using the OEWMA-GLRT chart. The MSKPLS is used to compute the monitored residuals and the OEWMA-GLRT chart (which is presented next) is applied to the evaluated monitored residuals for FD purposes.

OEWMA-GLRT detection chart description

The new detection chart combines the advantages of the EWMA and GLRT charts. It also consists of optimizing the EWMA chart using a multi-objective function. It reduces the MDR, the FAR, and the detection speed $(AR L_{1})$ by selecting the optimal EWMA tuning parameters (L and $λ$ ).

Classical GLRT detection chart

The GLRT chart is a commonly used hypothesis testing technique for FD in model-based methods.^28,29 Let us denote for an observation vector $Y \in R^{m}$ follows a Gaussian distribution by $N (0; σ^{2} I_{m})$ or $N (θ \neq 0; σ^{2} I_{m})$ . Note that $σ^{2} > 0$ represents the known variance and $θ$ represents the mean vector. The hypothesis test problem can be defined as follows

{\begin{matrix} H_{0} = {Y ~ N (0; σ^{2} I_{m})}, & (null hypothesis) \\ H_{1} = {Y ~ N (θ; σ^{2} I_{m})}, & (alternative hypothesis) \end{matrix}

(18)

The parameter $θ$ may define the fault value in our study. The likelihood estimate of $θ$ is calculated through maximizing the GLRT $T (Y)$ as follows

\begin{matrix} T (Y) = 2 \log \frac{\sup_{θ \in R^{m}} f_{θ} (Y)}{f_{θ = 0} (Y)} \\ = 2 \log (\sup_{θ} \exp (- \frac{| | Y - θ | |_{2}^{2}}{2 σ^{2}}) / \exp (- \frac{| | Y | |_{2}^{2}}{2 σ^{2}})) \\ = \frac{1}{σ^{2}} (min_{θ} | | Y - θ | |_{2}^{2} + | | Y | |_{2}^{2}) \\ = \frac{1}{σ^{2}} (| | Y - \hat{θ} | |_{2}^{2} + | | Y | |_{2}^{2}) = \frac{1}{σ^{2}} | | Y | |_{2}^{2} \end{matrix}

(19)

where $\hat{θ}$ represents the maximum likelihood estimate of $θ$ . It is presented as $\hat{θ} = \arg min_{θ} | | Y - θ | |_{2}^{2} = Y$ , where $| | . | |_{2}$ represents the Euclidean norm. The probability density function of Y is defined as

f_{θ} (Y) = \frac{1}{{(2 π)}^{\frac{m}{2}} σ^{m}} \exp {- \frac{1}{2 σ^{2}} | | Y - θ | |_{2}^{2}}

Note that the maximization of the likelihood function corresponds to the maximization of its logarithm. The distribution of the GLRT decision function $T (Y)$ under $H_{0}$ permits establishing a statistical test with a preferred FAR, $α$ , where the threshold $δ_{α}^{2}$ is selected to fulfill the probability of the false alarm as follows

P_{0} (T (Y) \geq δ_{α}^{2} = α

(20)

where $P_{0} (A)$ represents the probability of an event A when Y is distributed according to the null hypothesis $H_{0}$ . As Y follows a normal distribution (equation (18)), the statistics T follows the $χ^{2}$ distribution law which is central under $H_{0}$ and non-central under $H_{1}$ . A parameter $κ_{θ}$ of non-centrality is defined and is equal to $κ_{θ} = 1 / σ^{2} | | θ | |_{2}^{2}$ . Thus, we need to find the distribution of the GLRT in order to choose a suitable threshold. The test statistic is distributed through a chi-square law^30,31

T (Y) = \frac{1}{σ^{2}} | | Y | |_{2}^{2} ~ χ_{m}^{2}

(21)

The GLRT technique is derived to detect a shift in the mean (fault), and its explicit asymptotic statistic is computed. It is used to detect an additive fault $θ$ through computing the maximum probability of detection for a given false alarm.

In this work, the FD problem is considered as a hypothesis test taking into account two possible hypotheses: a null hypothesis $H_{0}$ of no change, where measurement vector $x$ has no fault, and an alternative hypothesis $H_{1}$ of the change point, where $x$ has a fault. Thus, the statistic has to choose between $H_{0}$ and $H_{1}$ for the best detection. The GLRT will be applied to residuals generated from the MSKPLS model.

From equation (18), let $Y = e_{j}$ for $j = 1, \dots, m$ , then the problem of hypothesis test problem can be presented as

{\begin{matrix} H_{0} = e_{j} ~ N (0, 1)}, & (null hypothesis) \\ H_{1} = e_{j} ~ N (θ, 1), & (alternative one) \end{matrix}

(22)

and the GLRT statistic is presented as follows

G (e_{j}) = e_{j}^{2} ~ χ_{1, α}^{2}, j = 1, . . ., m

(23)

In order to choose the threshold $(G_{α})$ , we need to determine its distribution. Since e is normally distributed, the GLRT statistic will follow a Chi-square distribution $χ^{2}$

G ~ χ^{2}

(24)

If the GLRT statistic is the upper threshold $(G_{α})$ , there is a fault; otherwise, the process is considered to be operating under normal conditions.

OEWMA-GLRT detection chart

The EWMA chart was first developed by Roberts³² in 1959. It can be computed as follows³³

Z_{i} = λ X_{i} + (1 - λ) Z_{i - 1}, i = 1, \dots, N

(25)

where $λ$ is a smoothing parameter $(0 < λ \leq 1)$ which represents the memory change of the detection statistic, $X_{i}$ represents the ith individual observation value, and $Z_{0}$ represents the initial value which is taken as the in-control mean $μ_{0}$ .

The EWMA chart has two control limits: an upper control limit $(UCL)$ and a lower control limit $(LCL)$ . When $Z_{i}$ exceeds these control limits, the EWMA chart detects a fault.

These control limits are computed as follows³⁴

UCL = μ_{0} + L σ \sqrt{\frac{λ}{2 - λ} [1 - {(1 - λ)}^{2 i}]}

(26)

LCL = μ_{0} - L σ \sqrt{\frac{λ}{2 - λ} [1 - {(1 - λ)}^{2 i}]}

(27)

where L represents the control width of the EWMA statistic Z and $σ$ represents the standard deviation of X. For better use of the EWMA method, the smoothing parameter $λ$ and control width L need to be carefully chosen. In this article, we propose to develop an optimized exponentially weighted moving average (OEWMA) statistic which computes the EWMA statistic and its control limits with optimized smoothing parameters of EWMA $\hat{λ}$ and $\hat{L}$ .

In order to compute the optimal values $\hat{λ}$ and $\hat{L}$ , we use a multi-objective optimization (MOO).³⁵ The MOO is handled using three objective functions: MDR, FAR, and average run length $(AR L_{1})$ .³⁵

The OEWMA-GLRT statistic $(T)$ can be computed as

T (Y) = α \frac{G (Y)}{G_{α}} + (1 - α) \frac{Z (Y)}{UCL}

(28)

where $G_{α}$ is the control limit of GLRT, UCL is the upper control limit of EWMA statistic, and $α$ is the weight parameter $(0 < α < 1)$ .

In order to detect the faults in the residual vector R that is obtained from MSKPLS model, MSKPLS-based OEWMA-GLRT is proposed. The filtering and detection are two independent phases implemented in the FD algorithm (see Figure 4). Nevertheless, when these two tasks are simultaneously implemented, this may improve the FD abilities. Thus, we develop MSKPLS-based OEWMA-GLRT where the KPLS model is established based on the wavelet coefficients and the OEWMA-GLRT is applied to detect faults.

Figure 4.

Schematic illustration of the developed MSKPLS-based OEWMA-GLRT approach.

The proposed MSKPLS-based OEWMA-GLRT technique for FD is described in Algorithm 3.

Algorithm 3. MSKPLS-based OEWMA-GLRT algorithm.
1. Split the data into training set and testing set. Training data: 2. Standardize the data to have zero mean and unit variance. 3. Decompose X and Y data into coarse approximate scale and detail scale after selecting the best decomposition depth. 4. Apply the KPLS at each scale. 5. Reconstruct the X and Y data with KPLS. 6. Construct the MSKPLS model. 7. Compute the OEWMA-GLRT statistic T. 8. Compute the OEWMA-GLRT threshold $T_{t}$ . Testing data: 9. Standardize the data to have zero mean and unit variance. 10. Decompose X and Y data into coarse approximate scale and detail scale after selecting the best decomposition depth. 11. Apply the KPLS at each scale. 12. Reconstruct the X and Y data with KPLS. 13. Construct the MSKPLS model. 14. Compute the OEWMA-GLRT statistic T. 15. If the statistic $T_{t}$ is under the threshold $T_{t}$ , the system is under normal conditions. Else, a fault is declared.

Algorithm 3. MSKPLS-based OEWMA-GLRT algorithm.

1. Split the data into training set and testing set.
Training data:
2. Standardize the data to have zero mean and unit variance.
3. Decompose X and Y data into coarse approximate scale and detail scale after selecting the best decomposition depth.
4. Apply the KPLS at each scale.
5. Reconstruct the X and Y data with KPLS.
6. Construct the MSKPLS model.
7. Compute the OEWMA-GLRT statistic T.
8. Compute the OEWMA-GLRT threshold

T_{t}

.
Testing data:
9. Standardize the data to have zero mean and unit variance.
10. Decompose X and Y data into coarse approximate scale and detail scale after selecting the best decomposition depth.
11. Apply the KPLS at each scale.
12. Reconstruct the X and Y data with KPLS.
13. Construct the MSKPLS model.
14. Compute the OEWMA-GLRT statistic T.
15. If the statistic

T_{t}

is under the threshold

T_{t}

, the system is under normal conditions. Else, a fault is declared.

Applications

The validation of the developed approach is done through two simulated examples: synthetic data set¹⁰ and benchmark model International Association for Structural Control-American Society of Civil Engineers (IASC-ASCE).^36–39

FD using simulated example

The simulated example proposed in Bakshi⁴⁰ contains four variables where the first variable follows a normal distribution with zero mean and unit variance. The simulation is presented in the following equations

{\begin{matrix} X_{t_{1}} = [- 2 \times randn (200, 1); randn (300, 1); 3 \times randn (500, 1)] \\ X_{t_{2}} = \sin {(0.1 : 0.1 : 100)}^{T} + \cos {(0.1 : 0.1 : 100)}^{T} \\ X_{t_{3}} = 0.5 \times X_{t_{1}}^{2} + 0.5 \times X_{t_{2}} \\ X_{t_{4}} = 0.3 \times X_{t_{1}} + 0.7 \times X_{t_{2}}^{2} \end{matrix}

(29)

The input data matrix $\tilde{X}$ is contaminated with error that follows a Gaussian distribution with zero mean. This error consists of a measurement noise $v_{k - 1} ~ N (0, 0.2)$ , so that

{\begin{matrix} X_{1} (t) = X_{t_{1}} + σ_{v} \times randn (size (X_{t_{1}})) \\ X_{2} (t) = X_{t_{2}} + σ_{v} \times randn (size (X_{t_{2}})) \\ X_{3} (t) = X_{t_{3}} + σ_{v} \times randn (size (X_{t_{3}})) \\ X_{4} (t) = X_{t_{4}} + σ_{v} \times randn (size (X_{t_{4}})) \end{matrix}

(30)

The four model variables generated in the previous equation will be arranged in a matrix of the form

X (t) = [X_{1} (t), X_{2} (t), X_{3} (t), X_{4} (t)]

This matrix consists of four columns presenting the variables and $1000$ rows presenting the samples that express the evolution of the model over time. The four input variables’ responses are shown in Figure 5.

Figure 5.

The time evolution of the input variables.

The output variable will be arranged in a data matrix Y and it is generated as the following equation

Y (t) = X^{2} (t)

(31)

In order to construct the KPLS model, the input and output data should be scaled first. Then, the data set is divided into training data which consist of $500$ samples and testing data which consist of $500$ samples, which will be used to validate the performance of developed FD technique.

To illustrate the performance of the proposed technique OEWMA-GLRT chart, we compare it with the other techniques such as the GLRT chart and the OEWMA chart. This comparison addresses the linear and the nonlinear cases processed by the PLS model and the KPLS model, respectively. In the KPLS model, the radial basis function (RBF) is used with a parameter $η = 6$ . The number of the retained kernel principal components has opted for 2 using the CPV criterion. To test the data, a mean shift fault of size $4 σ$ is added to the first variable $X_{1}$ of the testing data set at $[400 to 500]$ . The detection comparison results between OEWMA-GLRT, OEWMA and GLRT methods for PLS and KPLS models are shown in Figures 6 –11 and Table 1.

Figure 6.

Results using PLS-based GLRT method.

Figure 7.

Results using PLS-based OEWMA method.

Figure 8.

Results using PLS-based OEWMA-GLRT method.

Figure 9.

Results using KPLS-based GLRT method.

Figure 10.

Results using KPLS-based OEWMA method.

Figure 11.

Results using KPLS-based OEWMA-GLRT method.

Table 1.

Comparison in terms of MDR, FAR, and $AR L_{1}$ .

Technique	GLRT			OEWMA			OEWMA-GLRT
Technique	MDR	FAR	$AR L_{1}$	MDR	FAR	$AR L_{1}$	MDR	FAR	$AR L_{1}$
PLS	52.277	51.253	37	32.574	55.03	1	30.28	49.002	24
KPLS	22.772	13.759	1	10.02	12.02	1	9.802	11.759	1

GLRT: generalized likelihood ratio test; OEWMA: optimized exponentially weighted moving average; OEWMA-GLRT: optimized exponentially weighted moving average-generalized likelihood ratio test; MDR: missed detection rate; FAR: false alarm rate; PLS: partial least squares; KPLS: kernel partial least squares.

Figures 6 –8 show that the PLS-based OEWMA-GLRT chart provides better detection results comparing to both PLS-based OEWMA and PLS-based GLRT charts. Otherwise, the results show that the detection abilities based on linear PLS model present a lot of false alarm and MDRs. This is because the PLS method assumes a linear relationship between variables. However, the simulated synthetic example (29) is nonlinear, which means that the linear PLS method cannot tackle the issue of non-linearity. To deal with this issue, KPLS method is applied here for modeling purposes. Figures 9 –11 and Table 1 show the detection performances based on KPLS model. The benefits of KPLS model on the detection performances over the linear PLS model are shown in Table 1.

Figures 9 –11 illustrate that the developed KPLS-based OEWMA-GLRT method gives the best detection quality with respect to the classical KPLS-based GLRT and KPLS-based OEWMA charts. All of them outperform the detection charts based on PLS model (see Figures 6 –9). Table 1 illustrates this in terms of the detection metrics: MDR, FAR, and $AR L_{1}$ values.

Figures 12 and 13 and Table 2 show the effect of the multiscale representation on the detection abilities. In fact, the use of the multiscale representation of the data may enhance the monitoring quality by reducing the missed detection and the FARs. The wavelet-based multiscale representation of data may ameliorate the detection abilities of the KPLS- and PLS-based methods (Figures 8 and 11 –13).

Figure 12.

Results using MSPLS-based OEWMA-GLRT method.

Figure 13.

Results using MSKPLS-based OEWMA-GLRT method.

Table 2.

Comparison in terms of MDR, FAR, and $AR L_{1}$ .

Technique	MSPLS			MSKPLS
Technique	MDR	FAR	$AR L_{1}$	MDR	FAR	$AR L_{1}$
OEWMA-GLRT	38.316	44.85	1	0	2.005	1

MSPLS: multiscale partial least squares; MSKPLS: multiscale kernel partial least squares; MDR: missed detection rate; FAR: false alarm rate; OEWMA-GLRT: optimized exponentially weighted moving average-generalized likelihood ratio test.

FD using simulated benchmark process

The validation of the proposed approach is also assessed using a nonlinear simulated benchmark structure.^36–39 The benchmark structures provided by the IASC-ASCE Structural Health Monitoring Group are studied to verify the effectiveness of damage detection.

The benchmark gives an analytical model based on the structure 12-degree-of-freedom (DOF) shear-building model. It consists of a $2.5 m \times 2.5 m$ plan with $3.6 m$ tall.^41,42 The sections are designed for a scale model. It contains four floors where the slabs and the beams move as rigid bodies with translation in the x and y directions and rotation $θ$ about the center column. Hence, there are 3 degrees of freedom per floor. Figure 14 presents the analytical model diagram.

Figure 14.

Steel frame scale structure.

More details on the IASC-ASCE benchmark are presented in Chaabane et al.⁴² This model was developed to generate simulated input–output response data.⁴²

In order to construct the KPLS model, the input and output data should be scaled first. Then, the data set is divided into training data which consists of $1000$ samples and testing data which consists of $1000$ samples and that will be used to validate the performance of developed FD technique.

When we apply the multiscale KPLS to construct the model, the best decomposition depth needs to be chosen in order to get good detection. The best decomposition depth which gives the lower MDR, FAR, and $AR L_{1}$ is equal to 3. To test the data, a mean shift fault of size $2 σ$ is added to the third variable $X_{3}$ of the testing data set at $[800 to 1000]$ .

The detection performances of GLRT-, OEWMA-, and OEWMA-GLRT-based linear and nonlinear PLS methods are shown in Figures 15 –20 and Table 3. We can conclude from the results presented in Figures 15 –17 and Table 3 that the PLS-based charts result in bad monitoring abilities. This is due to the fact that the PLS assumes that the relationships between variables are linear. Thus, it may not be appropriate for nonlinear analysis. Thus, in order to deal with nonlinear practical processes, a nonlinear PLS (KPLS) will be applied in the modeling phase. Figures 18 –20 and Table 3 present the detection performances of KPLS-based charts. According to Figure 18 and Table 3, the detection efficiency of the KPLS-based OEWMA detection is lower than the KPLS-based OEWMA-GLRT (Figure 19 and Table 3). And all of them outperform the PLS-based approaches (Figures 15 –17 and Table 3).

Figure 15.

Results using PLS-based OEWMA-GLRT method.

Figure 16.

Results using PLS-based OEWMA method.

Figure 17.

Results using PLS-based OEWMA-GLRT method.

Figure 18.

Results using KPLS-based OEWMA-GLRT method.

Figure 19.

Results using KPLS-based OEWMA method.

Figure 20.

Results using KPLS-based OEWMA-GLRT method.

Table 3.

MDR, FAR, and $AR L_{1}$ evaluation using PLS and KPLS-based OEWMA-GLRT, OEWMA, and OEWMA-GLRT charts.

Technique	GLRT			OEWMA			OEWMA-GLRT
Technique	MDR	FAR	$AR L_{1}$	MDR	FAR	$AR L_{1}$	MDR	FAR	$AR L_{1}$
PLS	41.14	35.31	21	44.74	29.74	1	41.18	29.45	14
KPLS	15.25	6.29	1	4.52	91.82	1	13.72	5.47	1

Figures 21 and 22 illustrate the detection performances and show the benefits of multiscale representation when using multiscale partial least squares (MSPLS)-based OEWMA-GLRT and MSKPLS-based OEWMA-GLRT methods. These figures show that MSKPLS-based OEWMA-GLRT method has superior effectiveness over linear MSPLS-based OEWMA-GLRT method (see Table 4) and both of them outperform the classical-based charts. This is because the multiscale representation of data might be effective to noises and has a great impact on the detection performances (Figures 21 and 22).

Figure 21.

Results using MSPLS-based OEWMA-GLRT method.

Figure 22.

Results using MSKPLS-based OEWMA-GLRT method.

Table 4.

MDR, FAR, and $AR L_{1}$ evaluation using MSPLS and MSKPLS-based OEWMA-GLRT charts.

Technique	MSPLS			MSKPLS
Technique	MDR	FAR	$AR L_{1}$	MDR	FAR	$AR L_{1}$
OEWMA-GLRT	9.116	1.45	1	0	1.35	1

Conclusion

In this work, a MSKPLS-based OEWMA and GLRT is proposed for FD. The MSKPLS method is applied for modeling and OEWMA-GLRT chart is used for detection. The developed OEWMA-GLRT consists of computing an optimal statistic that merges current and previous information providing more importance to the most recent data. It chooses the optimal tuning parameters that minimize the MDR, the FAR, and the out-of-control average run length $(AR L_{1})$ . This helps to provide a more accurate estimation of the GLRT statistic and provide a stronger memory that enables better decision making with respect to FD. The monitoring effectiveness of the proposed algorithm was compared to classical algorithms using two examples, synthetic data and benchmark structure. The results showed the detection efficiency of the proposed technique in terms of MDR, FAR, and $AR L_{1}$ values.

Footnotes

Handling Editor: James Baldwin

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was made possible by NPRP grant NPRP9-330-2-140 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

ORCID iDs

Marwa Chaabane

Majdi Mansouri

References

León

David Clark

Detecting changes in field reliability using data from a complex factory screen. Qual Eng 2004; 17: 67–75.

Dzunic

Chen

Mobahi

, et al. A Bayesian state-space approach for damage detection and classification. Mech Syst Signal Pr 2017; 96: 239–259.

Wang

Law

, et al. Covariance of dynamic strain responses for structural damage detection. Mech Syst Signal Pr 2017; 95: 90–105.

Deng

Xie

Damage detection with streamlined structural health monitoring data. Sensors 2015; 15: 8832–8851.

Yan

De Boe

Golinval

JC.

Structural damage diagnosis by Kalman model based on stochastic subspace identification. Struct Health Monit 2004; 3: 103–119.

Jiang

Yuan

Liu

Bayesian inference method for stochastic damage accumulation modeling. Reliab Eng Syst Safe 2013; 111: 126–138.

Kresta

MacGregor

Marlin

TE.

Multivariate statistical monitoring of process operating performance. Can J Chem Eng 1991; 69: 35–47.

Wang

Enhanced fault detection for nonlinear processes using modified kernel partial least squares and the statistical local approach. Can J Chem Eng 2018; 96: 1116–1126.

Lee

Qin

Lee

IB.

Fault detection of non-linear processes using kernel independent component analysis. Can J Chem Eng 2007; 85: 526–536.

10.

Mansouri

Nounou

, et al. Kernel PCA-based GLRT for nonlinear fault detection of chemical processes. J Loss Prevent Process Ind 2016; 40: 334–347.

11.

Wold

Sjöström

Eriksson

PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst 2001; 58: 109–130.

12.

Wang

Liu

Jiang

, et al. Casing vibration response prediction of dual-rotor-blade-casing system with blade-casing rubbing. Mech Syst Signal Pr 2019; 118: 61–77.

13.

Wang

Jiang

Yang

, et al. Study on the diagnosis of rub-impact fault based on finite element method and envelope demodulation. J Vibroeng 2016; 18: 4500–4512.

14.

Wang

Jiang

Yang

Dual-tree complex wavelet transform and SVD-based acceleration signals denoising and its application in fault features enhancement for wind turbine. J Vib Eng Technol 2019; 7: 311–320.

15.

Zhang

Jiang

Han

, et al. Rotating machinery fault diagnosis for imbalanced data based on fast clustering algorithm and support vector machine. J Sensors 2017; 2017: 8092691.

16.

Shirali

Mohammadfam

Ebrahimipour

A new method for quantitative assessment of resilience engineering by PCA and NT approach: a case study in a process industry. Reliab Eng Syst Safe 2013; 119: 88–94.

17.

Zhao

Fan

Wang

Non-linear partial least squares response surface method for structural reliability analysis. Reliab Eng Syst Safe 2017; 161: 69–77.

18.

Kourti

MacGregor

JF.

Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemometr Intell Lab Syst 1995; 28: 3–21.

19.

Sheriff

Botre

Mansouri

, et al. Process monitoring using data-based fault detection techniques: comparative studies. In: Demetgul

Ünal

(eds) Fault diagnosis and detection. London: IntechOpen, 2017.

20.

Shawe-Taylor

Cristianini

An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press, 2000.

21.

Nguyen

Golinval

JC.

Fault detection based on kernel principal component analysis. Eng Struct 2010; 32: 3683–3691.

22.

Mansouri

Nounou

Harkat

, et al. Fault detection of chemical processes using improved generalized likelihood ratio test. In: Proceedings of the 22nd international conference on digital signal processing (DSP), London, 23–25 August 2017, pp.1–5. New York: IEEE.

23.

Godoy

Zumoffen

Vega

, et al. New contributions to non-linear process monitoring through kernel partial least squares. Chemometr Intell Lab Syst 2014; 135: 76–89.

24.

Teppola

Minkkinen

Wavelet–PLS regression models for both exploratory data analysis and process monitoring. J Chemometrics 2000; 14: 383–399.

25.

Zhang

Multivariate process monitoring and analysis based on multi-scale KPLS. Chem Eng Res Des 2011; 89: 2667–2678.

26.

Bakshi

Top

Multiscale statistical process monitoring and diagnosis of univariate and multivariate processes. New York: American Institute of Chemical Engineers, 1998.

27.

Sheriff

Mansouri

Karim

, et al. Fault detection using multiscale PCA-based moving window GLRT. J Process Control 2017; 54: 47–64.

28.

Gustafsson

The marginalized likelihood ratio test for detecting abrupt changes. IEEE Trans Autom Control 1996; 41: 66–78.

29.

Willsky

Chow

Gershwin

, et al. Dynamic model-based techniques for the detection of incidents on freeways. IEEE Trans Autom Control 1980; 25: 347–360.

30.

Kay

SM.

Fundamentals of statistical signal processing: detection theory, vol. 2. Upper Saddle River, NJ: Prentice Hall, 1998.

31.

Botre

Mansouri

Karim

, et al. Multiscale PLS-based GLRT for fault detection of chemical processes. J Loss Prevent Process Ind 2017; 46: 143–153.

32.

Roberts

SW.

Control chart tests based on geometric moving averages. Technometrics 1959; 1: 239–250.

33.

Hunter

JS.

The exponentially weighted moving average. J Qual Technol 1986; 18: 203–210.

34.

Montgomery

DC.

Introduction to statistical quality control. New York: John Wiley & Sons, 2009.

35.

Baklouti

Mansouri

Hamida

, et al. Monitoring of wastewater treatment plants using improved univariate statistical technique. Process Saf Environ Protect 2018; 116: 287–300.

36.

Zhang

Guo

, et al. Flexibility-based structural damage detection with unknown mass for IASC-ASCE benchmark studies. Eng Struct 2013; 48: 486–496.

37.

Johnson

Lam

Katafygiotis

, et al. A benchmark problem for structural health monitoring and damage detection. In: Casciati

Magonette

(eds) Structural control for civil and infrastructure engineering. Singapore: World Scientific, 2001, pp.317–324.

38.

Reda Taha

. A neural-wavelet technique for damage identification in the ASCE benchmark structure using phase II experimental data. Adv Civ Eng 2010; 2010: 675927.

39.

Huang

Beck

Bayesian system identification based on hierarchical sparse Bayesian learning and Gibbs sampling with application to structural damage assessment. Comput Meth Appl Mech Eng 2017; 318: 382–411.

40.

Bakshi

BR.

Multiscale PCA with application to multivariate statistical process monitoring. AIChE J 1998; 44: 1596–1610.

41.

Johnson

Lam

Katafygiotis

, et al. Phase I IASC-ASCE structural health monitoring benchmark problem using simulated data. J Eng Mech 2004; 130: 3–15.

42.

Chaabane

Mansouri

Ben Hamida

, et al. Multivariate statistical process control-based hypothesis testing for damage detection in structural health monitoring systems. Struct Control Health Monit 2019; 26: e2287.