A distributed expectation maximization-principal component analysis monitoring scheme for the large-scale industrial process with incomplete information

Abstract

Large-scale process monitoring has become a challenging issue due to the integration of sub-systems or subprocesses, leading to numerous variables with complex relationship and potential missing information in modern industrial processes. To avoid this, a distributed expectation maximization-principal component analysis scheme is proposed in this paper, where the process variables are first divided into several sub-blocks using two-layer process decomposition method, based on knowledge and generalized Dice’s coefficient. Then, the missing information of variables is estimated by expectation maximization algorithm in the principal component analysis framework, then the expectation maximization-principal component analysis method is applied for fault detection to each sub-block. Finally, the process monitoring and fault detection results are fused by Bayesian inference technique. Case studies on the Tennessee Eastman process is applied to show the effectiveness and performance of our proposed approach.

Keywords

Distributed expectation maximization-principal component analysis incomplete information fault detection large-scale process Bayesian inference

Introduction

Nowadays, associated with the high industrial requirements on system safety and process reliability, intensive attention has been dedicated to process monitoring in large-scale industrial processes,^1–7 where the model-based and data-driven process monitoring methods are both available. There are several model-based approaches achieving process monitoring, for instance, numerous studies obtain the residuals in linear systems and nonlinear systems by observer techniques.^8–13 However, the applicability of these methods is limited, as establishing an accurate first principle model for complex system model is often difficult and sometimes even impossible.

Among the data-driven approaches, the multivariate statistical process monitoring (MSPM) methods are developed rapidly, such as partial least squared (PLS) method,¹⁴ multivariate statistical process control (SPC) methods,¹⁵ concurrent projection to latent structures method,¹⁶ and multi-block kernel partial least squares.¹⁷ Among them, principal component analysis (PCA) method is regarded as an useful tool⁴ for decreasing original data dimension by preserving the main correlation structure between the variables. Facing large-scale process where numerous variables with complex relationships exist, distributed methods^3–5 are employed for large-scale monitoring process and the distributed PCA method is widely used to simplify the monitoring performance. To achieve the distributed monitoring performance, it is important to divide the process into some sub-blocks. The conventional way for process decomposition is that the prior process knowledge is considered during the decomposition procedure, and the final monitoring results are fused by Bayesian inference from each sub-block.^3,18–20 However, it is not always easy to obtain accurate process information that can be used for further variable division, where data-driven methods could be an alternative way.²¹ For instance, mutual information (MI) is applied in a distributed monitoring framework, and then a distributed monitoring method integrated with MI-spectral clustering and Bayesian inference is proposed.^22–24 In addition, the generalized Dice’s coefficient (GDC) of a loading matrix was used to divide large-scale process variables and form a multi-block monitoring framework.²⁵ The loading matrix is a typical model that describes the conversion of the variables mapping from the original principal components space. Meanwhile, it can reveal the correlation between the original variables and extracted components, which is well-reflected in the nature of the process. Furthermore, loading matrix has a lower dimension for block division, while the process noises may lead to performance degradation or decomposition mistakes with GDC method. Hence, a two-layer decomposition method combining knowledge-based and GDC-based is proposed to improve the monitoring performance for large-scale processes. The large-scale process variables are divided into blocks according to the prior knowledge in the first layer. After that, each block is further divided into sub-blocks by GDC in the second layer.

Most of the data-driven methods need to rely on the complete data to achieve process monitoring and fault detection performance, while the incomplete or missing data issue is common in large-scale industrial process, which may cause sensor failure, heavy control network traffic, data overflow, and other potential errors. Therefore, it is necessary to find an appropriate method for data interpolation, such as the mean substitution method applied,²⁶ the regression interpolation method,²⁷ and expectation maximization (EM) algorithm.²⁸ However, the mean substitution method has large deviation, and the regression interpolation method only focuses on several specific cases. EM algorithm is an iterative method for achieving the maximum-likelihood estimation to deal with the multiple missing data patterns.⁶ To address the issue of large-scale data interpolation, EM is applied into PCA algorithm,^29–31 which can obtain the optimal value of blanks in incomplete data. EM algorithm for PCA gives simple and efficient computation cost, as the fewer eigenvectors and eigenvalues for monitoring process with large-scale data are in high dimensions.¹⁹However, this method is mostly used to solve centralized process monitoring problems; in this paper, it is applied in each sub-block with missing data, in order to obtain the complete data sets.

The motivation of this work is to propose a distributed EM-PCA (DEM-PCA) monitoring method for fault detection with missing data. The large-scale variables are first divided into several sub-blocks based on prior knowledge and GDC. Then, a PCA model is constructed in each sub-block to achieve fault detection and the EM-PCA method is utilized to deal with the incomplete measurement data; finally, a comprehensive monitoring result is acquired for the whole process fault detection by Bayesian inference. The proposed DEM-PCA scheme is applied into the Tennessee Eastman (TE) benchmark to show the monitoring performance.

Preliminaries

PCA for process monitoring

PCA is a fundamental method that decomposes high-dimensional process data into orthogonal low-dimensional subspaces and preserves the main information. Suppose a set of process monitoring data is denoted as $Y^{m \times n}$ , where m is the number of observation variables and n is the number of samples. The traditional PCA model decomposition is developed as³

Y = CX + E

(1)

where matrix $X \in R^{k \times n}$ refers to the principal components, $C \in R^{m \times k}$ is called loading matrix, and matrix E consists of residual variables. In the principal components subspace (PCS), k is the number of the principal components, and PCS is denoted by CX. The transform matrix C is always calculated by eigenvalue decomposition of observation covariance matrix $Y^{T}$ , which is denoted as S

S = \frac{Y Y^{T}}{n - 1} = C Λ C^{T}

(2)

where $Λ = diag (λ_{1}, λ_{2}, \dots, λ_{m})$ . So the matrix Y changes to low-dimensional score matrix X by following expression

X = C^{T} Y

(3)

Then, two traditional statistical variables called $T^{2}$ and Q statistics are proposed, which are used to find the faults of process. Their statistical indexes and their confidence limits can be calculated by mathematical statistics approach as follows. If the statistical value exceeds the confidence limit, a fault emerges.

Hotelling’s $T^{2}$ statistic is an index that measures the diversification of sample vectors in the principal space, it is provided as⁴

T^{2} = y^{T} C \cdot Λ^{- 1} \cdot C^{T} y \leq T_{\lim}^{2}

(4)

where $Λ = diag {λ_{1}, λ_{2}, \dots, λ_{k}}$ and $T_{\lim}^{2}$ is the confidence limit of $T^{2}$ statistic. There is a traditional mathematical method for control limit $T_{\lim}^{2}$

T_{\lim}^{2} = \frac{k (n - 1)}{n - k} F_{k, (n - k), α}

(5)

where $α$ is significance level, as well as $F_{k, (n - k), α}$ is an F-distribution with k and $(n - k)$ degrees of freedom and $α$ confidence.

Q statistic is an index that indicates the change of the sample vector projection in the residual space

Q = ‖ (I - C \cdot C^{T}) \cdot y ‖^{2} \leq Q_{\lim}

(6)

where $Q_{\lim}$ is the confidence limit of Q statistic.

The control limit $Q_{\lim}$ can be expressed as follows

Q_{\lim} = θ_{1} {[1 + \frac{c_{a} \sqrt{2 θ_{2} h_{o}^{2}}}{θ_{1}} + \frac{θ_{2} h_{o} (h_{o} - 1)}{θ_{1}^{2}}]}^{\frac{1}{h_{1}}}

(7)

where $α$ indicates significance level, $c_{a}$ is the threshold at the standardized normal distribution with a significance level of $α$ . $θ_{i} = \sum_{j = k + 1}^{m} λ_{j}^{i}, i = 1, 2, 3$ , $λ_{j}^{i}$ is the eigenvalue of the $Y^{T}$ covariance matrix; and the index $h_{o}$ is given as

h_{o} = 1 - \frac{2 θ_{1} θ_{3}}{3 θ_{1}^{2}}

(8)

According to the $T_{\lim}^{2}$ and $Q_{\lim}$ statistic indexes and their detection logic, the fault detection will be solved easily.

In addition, the contribution plot is used to find abnormal variables for fault diagnosis in PCA algorithm,²⁹ which calculates the variables contribution rate of partial sub-blocks. According to their contribution value, one or several variables can be selected to be responsible for the fault. The basic calculation of variables contribution is given as follows.

Assume that y is the observed sample set of a sub-block, and the jth measured variable of process is over the limit, then its total contribution can be calculated by

CON X_{j} = \sum_{i = 1}^{k} (con x_{i, j})

(9)

where k is the number of the principal components, and $con x_{i, j}$ is the contribution of variable $y_{i}$ to the score $x_{i}$ . $x_{i}$ denotes the ith score element of X^T. Hence, $con x_{i, j}$ is obtained by following expression

con x_{i, j} = \frac{x_{i}}{λ_{i}} c_{i, j} (y_{j})

(10)

where $λ_{i}$ is the corresponding eigenvalue from the matrix $Λ$ , and $c_{i, j}$ is given by the loading matrix C of the sub-block.

Generally, the contribution value of a variable is larger, this variable is more likely considered to cause fault occurrence. However, the industrial cause of the failure is also required to consider the background knowledge of the process.

The EM-PCA algorithm

PCA is an effective method which can be used to monitor process with complete data. However, sometimes measurement data is unavailable due to sensor failure or heavy network traffic. In that case, the EM algorithm,⁶ as an iterative algorithm, can be applied to estimate incomplete sample data with the loss of several measured variables.

The PCA linear latent variable model is written as

y = Cx + e

(11)

where $x \in R^{k}$ is regarded as the latent variables where k is the number of principal elements, $y \in R^{m}$ is an observed variable, and matrix $C \in R^{m \times k}$ is a loading matrix that provides a linear mapping of x and y, e denotes the noise variable. Since PCA can be regarded as a special case of linear Gauss model, we can give the following assumptions: (1) x is Gaussian distribution that can be expressed by $x ~ N (0, I)$ and each variable is independent. (2) The noise variable e submits Gaussian distribution such that $e ~ N (0, σ^{2} I)$ . Hence, $y ~ N (0, C C^{T} + σ^{2} I)$ denotes a Gaussian probability model for EM algorithm.

According to the noise, Gaussian distribution is like $e ~ N (0, σ^{2} I)$ and $e = y - Cx$ , it is easy to obtain the conditional probability of y, that is given by

p (y | x; C, σ^{2}) = {(2 π σ^{2})}^{- \frac{m}{2}} e^{- \frac{1}{2 σ^{2}} {‖ y - Cx ‖}^{2}}

(12)

Then, applying the assumed prior probability of x, which is calculated by

p (x) = {(2 π)}^{- \frac{k}{2}} e^{- \frac{x^{T} x}{2}}

(13)

Thus, the marginal probability of y is given as

p (y; C, σ^{2}) = {(2 π)}^{- \frac{m}{2}} {| Σ |}^{- \frac{1}{2}} e^{- \frac{y^{T} Σ^{- 1} y}{2}}

(14)

where $Σ$ is defined as

Σ = C C^{T} + σ^{2} I

(15)

The log-likelihood function of measured variables y can be expressed as

\begin{matrix} L (C, σ^{2}) = & p (y; C, σ^{2}) = - \frac{nm}{2} \ln (2 π) \\ - \frac{n}{2} \ln | Σ | - \frac{n}{2} tr (Σ^{- 1} S) \end{matrix}

(16)

where S is the covariance matrix of measured samples, which is defined as above expression (2).

Furthermore, the posterior probability of x can be obtained by following equation

\begin{matrix} p (x | y) & = \frac{p (y | x) p (x)}{p (y)} \\ = {(2 π)}^{- k / 2} {| σ^{- 2} M |}^{1 / 2} e^{- \frac{{(x - M^{- 1} C^{T} y)}^{T} (σ^{- 2} M) (x - M^{- 1} C^{T} y)}{2}} \end{matrix}

(17)

where the matrix M is defined as $M = C^{T} C + σ^{2} I$ .

For estimation of C and x, the EM algorithm is applied with maximum-likelihood estimate (MLE) iteration in two steps: the E-step is to give a parameter probability function that consists of unknown variables, and M-step is to estimate the parameter by calculating the maximum expectation of log-likelihood function in E-step.

E-step

The posterior probability of x and y can be given by the Bayesian rule as follows

p (x | t) ~ N (M^{- 1} C^{T} y, σ^{2} M^{- 1})

(18)

where matrix M is defined as $M = C^{T} C + σ^{2} I$ .

In addition, the expectation of x and y are denoted as $< x > = M^{- 1} C^{T} y$ and $< y > = Cx$ , so extending to the matrix form, the expectation of matrix X and Y are shown as

〈 X 〉 = M^{- 1} C^{T} Y

(19)

〈 Y 〉 = CX

(20)

where $X \in R^{k \times n}, Y \in R^{m \times n}$ , n is the number of measured samples.

M-step

The joint distribution of x and y are shown as

p (y, x) = p (y | x) p (x) = {(2 π σ^{2})}^{- \frac{m}{2}} \cdot e^{- \frac{{‖ y - C^{T} x ‖}^{2}}{2 σ^{2}}} {(2 π)}^{- \frac{k}{2}} \cdot e^{- \frac{{‖ x ‖}^{2}}{2}}

(21)

Therefore, the log-likelihood of $p (y, x)$ is expressed by

\begin{matrix} \ln [p (y, x)] = & - \frac{(m + k)}{2} \ln (2 π) - \frac{m}{2} \ln (σ^{2}) \\ - \frac{{‖ y - C^{T} x ‖}^{2}}{2 σ^{2}} - \frac{{‖ x ‖}^{2}}{2} \end{matrix}

(22)

and the expectation of this log-function with matrix X and Y is given

\begin{matrix} 〈 L 〉 = - \frac{n}{2} (m + k) \ln (2 π) - \frac{nm}{2} \ln (σ^{2}) \\ - \frac{1}{2 σ^{2}} tr (Y^{T} Y - 2 {〈 X 〉}^{T} C^{T} Y + C^{T} C {〈 XX 〉}^{T}) \\ - \frac{1}{2} tr (〈 X X^{T} 〉) \end{matrix}

(23)

where $< X X^{T} > = σ^{2} M^{- 1} + < X > < X >^{T}$ .

Derivation of the expectation function $< L >$ on C and $σ^{2}$ and let them be zero, so that we can obtain two expressions as

\tilde{C} = Y {〈 X 〉}^{T} {〈 X X^{T} 〉}^{- 1}

(24)

\tilde{σ^{2}} = \frac{1}{nm} tr (Y^{T} Y - 2 {〈 X 〉}^{T} C^{T} Y + C^{T} C 〈 X X^{T} 〉)

(25)

For obtaining result, we assume that the noise variable e is toward zero and according to equations (20) and (21), a final expression of EM-PCA algorithm yields

E-step

〈 X 〉 = {(C^{T} C)}^{- 1} C^{T} Y

(26)

〈 Y 〉 = CX

(27)

M-step

C^{new} = Y X^{T} {(X X^{T})}^{- 1}

(28)

Regarding the expectation matrix $< X >$ and $< Y >$ as a new iterative result, the E-step can be simplified as

X = {(C^{T} C)}^{- 1} C^{T} Y

(29)

Y = CX

(30)

EM-PCA algorithm provides complete data for subsequent process detection and diagnosis, as well as reduces the calculation dimension. The E and M steps of EM-PCA algorithm are applied to estimate blank parts of process monitoring data. Through multiple iteration, we can obtain the estimated sampling data set, and then the monitoring process detection and diagnosis can use this complete data set to obtain a more accurate result.

In addition, with the aim of minimizing the difference between measured variables, it is important to make data standardization at first. After using proximity method (using the previous value) to obtain the complete data set, the expression of standardization is given as follows

y b_{ij} = \frac{y_{ij} - \bar{y_{i}}}{s_{i}}

(31)

where i is the number from 1 to m, j is from 1 to n, $\bar{y_{i}}$ is the mean of ith observed variables, $s_{i}$ is the standard deviation of ith observed variables.

Distributed monitoring scheme with EM-PCA

In this section, the distributed monitoring based on EM-PCA method is proposed in detail. In distributed monitoring, the first step is to decompose the process into several units for dimensionality reduction. In addition, considering missing data problem in the measured variables, EM-PCA algorithm is applied.

Process decomposition

Obviously, the division of a whole process data set has several advantages, for instance, the distributed monitoring process can improve the sensitivity of fault detection performance, the fault tolerance is also increased by more than one monitoring block, and the distributed system can reduce the computational complexity of the monitoring process.

In this paper, the large-scale process decomposition method is proposed by using both knowledge and GDC of the loading matrix, including two layers. Through the previous knowledge of process, the whole monitoring system can be divided into the first-layer blocks, and then each distributed monitoring models is treated separately in the second layer by using GDC. Finally, a two-layer distributed monitoring model is constructed and the large monitoring variables can be divided twice into several sub-blocks.

Besides, when the process is decomposed by knowledge-based method in the first layer, there are several points that need attention; and are as follows:

An appropriate way to classify the sensors variables into different blocks is that, minimizing the possibility of simultaneous failures that belong to each block existence. So that, even if some blocks have something wrong, other normal blocks still work, the monitoring system can give a reliable result all the time.

Based on the prior knowledge of process, it is seen that different blocks are as diverse as possible, while the combination of all blocks covers the whole process. As a result, each monitoring block has little influence on others, and the performance can be much more accurate.

Each block monitoring result should communicate with each other in order to achieve the fault diagnosis of the whole system. Therefore, it is important to consider the communication between blocks, when the large-scale process is decomposed.

However, the variables in first-layer blocks only reflect the general behaviors of process, the local behaviors inside the sub-systems are not reflected which should also be important for monitoring process results. Therefore, after the large process variables are divided into first-layer blocks, each block should be further clustered through data-driven approaches. In this paper, a GDC-based method of the loading matrix is proposed to obtain the further division of the blocks. Suppose that a data set $Y^{m \times n}$ and the PCA model $Y = CX + E$ is constructed, where the original loading matrix $C = [\begin{matrix} c_{11} & \dots & c_{1 m} \\ ⋮ & ⋱ & ⋮ \\ c_{m 1} & \dots & c_{mm} \end{matrix}]$ which is the same as mentioned in section “The EM-PCA algorithm.” It is worth noting that if an element $c_{ij}$ is close to 1 or −1, the relationship of variables is closer, and this correlation could have influence in the T² statistic. Thus, it is helpful to define a new weight-loading vector as follows

\tilde{c_{j}} = {(| c_{1 j} |, | c_{2 j} |, \dots, | c_{mj} |)}^{T}

(32)

where $| c_{ij} |$ is denoted as the absolute value of ith row and jth column in matrix C.

The GDC can be obtained by following expression

sim (a, b) = \frac{2 \sum_{t = 1}^{m} a_{t} b_{t}}{\sum_{t = 1}^{m} a_{t}^{2} + \sum_{t = 1}^{m} b_{t}^{2}}

(33)

where $a = (a_{1}, a_{2}, \dots, a_{m})$ and $b = (b_{1}, b_{2}, \dots, b_{m})$ ; and m is the number of measured variables.

Then, GDC is calculated by weight-loading vectors, and the original data set can be divided by GDC parameters. The GDC parameters between each weight-loading vector are obtained by expression (34), where x and y are substituted by two different weight-loading vectors indicated as $\tilde{c_{v}} and \tilde{c_{w}}$ , respectively. Therefore, the calculation of GDC is expressed as following

sim (\tilde{c_{v}}, \tilde{c_{w}}) = \frac{2 \sum_{t = 1}^{m} | c_{tv} | | c_{tw} |}{\sum_{t = 1}^{m} c_{tv}^{2} + \sum_{t = 1}^{m} c_{tw}^{2}}

(34)

If the GDC parameters are close to 1, the variables should have high similarity. In order to make an objective decision, a threshold $α$ is proposed to judge a correlation between two vectors

D (\tilde{c_{v}}, \tilde{c_{w}}) = {\begin{matrix} 0, if sim (\tilde{c_{v}}, \tilde{c_{w}}) < α \\ 1, if sim (\tilde{c_{v}}, \tilde{c_{w}}) \geq α \end{matrix}

(35)

When D = 1, the loading vector $c_{w}$ is put together with $c_{v}$ , the first sub-block is block C₁, which has the biggest eigenvalue in loading matrix. Next, the second sub-blocks C₂ is the biggest one except the block vectors of C₁. Using GDC method with complete loading matrix, the second-layer process decomposition is achieved, which makes all the information includes by loadings.

The first step of decomposition is using knowledge-based method to make large-scale process divided into the first-layer blocks. Then, each first-layer block needs secondary division by GDC method of the loading matrix. After two times separation of measurement variables, each variable should belong to only one sub-block in the second layer. Unfortunately, there are several variables with random missing data, which could cause the decreases in monitoring performance. Thus, sub-blocks monitoring results are obtained by EM-PCA method, which depends on estimation of missing parts. However, the sub-blocks monitoring results are given at second layer, the final goal is to acquire the large-scale monitoring process fault detection result. Hence, Bayesian algorithm is proposed to make information fusion and gives a fusion fault detection result for large-scale process.

EM-PCA handling missing data and Bayesian inference fusion

The process monitoring performance can be affected due to the large-scale measured variables from different local sensors and units of industrial process. Therefore, we use the two-layer process decomposition method in section “Process decomposition,” which divides the process variables into several sub-blocks. After a distributed monitoring scheme proposed, EM-PCA method is applied for incomplete data estimation.

The data set $Y_{m \times n}$ is measured from the large-scale processes, and then the variables can be divided as follows

Y = {[Y_{1}, Y_{2}, \dots, Y_{B}]}^{T}

(36)

where B is the number of distributed sub-blocks; and each sub-block is denoted by $Y_{b} \in R^{n \times m_{b}}$ , $m_{b}$ is the number of measured variables in bth sub-block. There are several random variables invalid in some sub-blocks, so the EM-PCA method is used to solve this problem in the sub-blocks with incomplete data.

According to the algorithm in section “The EM-PCA algorithm,” the EM-PCA method is used to estimate missing data in sub-block set $Y_{i} (i = 1, 2, \dots, B)$ , and the steps are summarized as follows:

Using proximity method to standardize the variables in $Y_{i}$ .

The matrix $C_{b} \in R^{m \times b_{k}}$ is initialized randomly, where the $b_{k}$ is the principal element of sub-block b and set p as the number of iterations, and we also set a tolerance limit $γ$ .

An optimal matrix C_b is obtained by iterative calculation of p times of E and M steps, which are denoted by expression (28) and (29) in section “The EM-PCA algorithm.” Therefore, the iterative calculation for each sub-block can be written as follows

E - step : X_{b} = {({C_{b}}^{T} C_{b})}^{- 1} C_{b}^{T} Y_{b}

M - step : C_{b}^{new} = Y_{b} X_{b}^{T} {(X_{b} X_{b}^{T})}^{- 1}

A new estimated matrix $Y_{b}^{new}$ can be obtained by expression as $Y_{b}^{new} = C_{b} X_{b}$ .

The missing part of the incomplete data set $X_{b}^{new}$ can be obtained by the estimated part of the corresponding position of $X_{b}^{new}$ .

The iteration stops until the estimated parts are almost unchanged or the convergence condition is reached (If $| | X_{b}^{new} - X_{b} | | \leq γ$ or it has p iterations for E and M steps), whether to go back to the step (3).

In addition, the new matrixes $X_{b}$ and $C_{b}$ are obtained by EM-PCA method, which are the same as the principal components matrix and the loading matrix in a regular PCA model, are denoted by $X_{b}^{* T}$ and $C_{b}^{*}$ .

After process decomposition by two-layer method and value estimation by EM-PCA algorithm, the PCA monitoring model is established in each sub-block, and we can construct the PCA monitoring model of each sub-block in order to obtain the fault detection results, and calculate the statistics T² and Q by formulas (4) and (6), which are expressed as follows

T_{b}^{2} = X_{b}^{*} \cdot X_{b}^{* T}

(37)

Q_{b} = ‖ (I - C_{b}^{*} \cdot C_{b}^{* T}) \cdot y_{b} ‖^{2}

(38)

Meanwhile, the confidence limits of T² and Q are given by equations (5) and (7) using variables without missing data. Thus, several sub-blocks monitoring results are given according to the comparison between statistics and confidence limits, then the fault detection result of large-scale process should be required by efficiency information fusion.

Here, the Bayesian inference strategy³² is used to make each sub-block information fusion. It is a fusion strategy that calculate the fault probability of T² and Q in each monitoring block by following expression⁷

P (F | y_{b}) = \frac{P (y_{b} | F) P (F)}{P (y_{b})}

(39)

P (y_{b}) = P (y_{b} | N) P (N) + P (y_{b} | F) P (F)

(40)

where “N” denotes normal condition and “F” denotes fault condition, $y_{b}$ is the sample of bth sub-block, P(N) is the priori probability of normal condition, and the conditional probabilities $P (y_{b} | N)$ and $P (y_{b} | F)$ can be calculated by

P (y_{b} | N) = e^{- T_{b, new}^{2} / T_{b th}^{2}}, P (y_{b} | F) = e^{- T_{b th}^{2} / T_{b, new}^{2}}

(41)

And it is the same way for Q statistic to obtain the conditional probabilities $P (y_{b} | N)$ and $P (y_{b} | F)$ . Therefore, the final Bayesian inference comprehensive (BIC) statistic should be given as

BIC = \sum_{b = 1}^{B} {\frac{P (y_{b} | F) P (F | y_{b})}{\sum_{j = 1}^{B} P (y_{j} | F)}}

(42)

According to the statistic BIC, the fault detection results of large-scale monitoring process can be obtained effectively. Thus, monitoring performance can be improved in distributed PCA monitoring process by Bayesian inference strategy.

In summary, based on the proposed DEM-PCA scheme, process variables are divided twice by knowledge-based and GDC-based manners first, which establish a two-layer framework for large-scale process monitoring. The data set needs to be divided into several monitoring sub-blocks data sets, then full information sample sets are obtained by incorporating the missing data information using EM-PCA method, as well as each sub-block gives its data-driven monitoring results separately by EM-PCA method. We can obtain a comprehensive monitoring result by Bayesian inference. In order to more clearly describe the DEM-PCA development process, Figure 1 shows the ﬂow chart.

Figure 1.

The flow chart of the DEM-PCA scheme.

Case study

In this section, the effectiveness of the proposed method is represented on the TE benchmark process. The following Figure 2 is TE process schematic diagram,^17,21 which has five units that includes a reactor, a compressor, a stripper, a condenser, and a separator. There are 41 measured variables (including 22 continues process variables and 19 composition process variables), and 11 manipulated variables in the TE monitoring process,⁹ in which, 960 sampling points are collected, where the first 160 sampling points are in the normal operation state.

Figure 2.

TE process scheme.

TE process decomposition

Before constructing the PCA model of the whole large-scale TE process, the process should be divided into several blocks based on the previous knowledge in the first layer. Each block is used to describe a physical or chemical operation or the performance of a sub-system unit. In this study, we only use 33 variables to achieve the process monitoring, each variable is listed in Table 1 for details. According to the physical theory of TE process, there are five major units, but we can find that the two of the five major units have only two measured variables, so the variables of the two units can be combined into other related blocks, which makes the monitoring process less complex. As a result, 33 measured variables of the TE process could be composed into three blocks to be detected separately. Then, the variables in each block of the first layer have further second-layer division, which is based on GDC clustering method in section “Process decomposition.” For example, in block 1, there are variable 1–9, 21, 23–26 and 32, then we calculate GDC parameters, combining larger ones for a sub-block. Therefore, in the second layer, variables in each sub-block are minimum correlation with others by data-driven method. The specific division of the total measured variables of TE process are described in Table 2.

Table 1.

Process variables in the TE process.

Variable no.	Process measurements
1	A feed (stream 1)
2	D feed (stream 2)
3	E feed (stream 3)
4	Total feed
5	Recycle flow (stream 8)
6	Reactor feed rate (stream 6)
7	Reactor pressure
8	Reactor level
9	Reactor temperature
10	Purge rate (stream 9)
11	Product separator temperature
12	Product separator level
13	Product separator pressure
14	Product separator underflow (stream 10)
15	Stripper level
16	Stripper pressure
17	Stripper underflow (stream 11)
18	Stripper temperature
19	Stripper steam flow
20	Compressor work
21	Reactor cooling water outlet temperature
22	Separator cooling water outlet temperature
23	D feed flow valve (stream 2)
24	E feed flow valve (stream 3)
25	A feed flow valve (stream 1)
26	Total feed flow valve (stream 4)
27	Compressor recycle valve
28	Purge valve (stream 9)
29	Separator pot liquid flow valve (stream 10)
30	Stripper liquid product flow valve (stream 11)
31	Stripper steam valve
32	Reactor cooling water flow
33	Condenser cooling water flow

TE: Tennessee Eastman.

Table 2.

Division of measured variables.

Block 1 (feed stream and reactor)	Sub-block 1	$x_{1}, x_{26}$
	Sub-block 2	$x_{2}, x_{4}, x_{5}, x_{32}$
	Sub-block 3	$x_{3}, x_{6} - x_{9}, x_{21}, x_{23} - x_{25}$
Block 2	Sub-block 4	$x_{10} - x_{13}, x_{27} - x_{29}$
(separator and compressor)	Sub-block 5	$x_{14}, x_{20}, x_{22}$
Block 3	Sub-block 6	$x_{15}, x_{16}, x_{30}, x_{31}$
(stripper and condenser)	Sub-block 7	$x_{17} - x_{19}, x_{33}$

The normal testing data set consists of 500 samples and the process decomposition is according to the training data. Each fault testing data set consists of 960 samples, where the 21 different faults occurred at 161th sample and then remained for the rest of the samples, which included 800 samples. The 21 faults for TE process and their created types are shown in Table 3. The “Type” in Table 3 means the variation type of each fault signal. The first seven faults are caused by step signals, faults 8–12 occur due to random variation, and the fault 13 is due to the existence of slow drift. Faults 14 and 15 occur in the case due to valves that are sticking. There are also some faults that happened with unknown reason.

Table 3.

Process faults of the TE process.

Fault no.	Variables	Type
1	A/C feed ratio, B composition constant (stream 4)	Step
2	B composition, A/C ratio constant (stream 4)	Step
3	D feed temperature (stream 2)	Step
4	Reactor cooling water inlet temperature	Step
5	Condenser cooling water inlet temperature	Step
6	A feed loss (stream 1)	Step
7	C header pressure loss-reduced availability (stream 4)	Step
8	A, B, C feed composition (stream 4)	Random variation
9	D feed temperature (stream 2)	Random variation
10	C feed temperature (stream 4)	Random variation
11	Reactor cooling water inlet temperature	Random variation
12	Condenser cooling water inlet temperature	Random variation
13	Reaction kinetics	Slow drift
14	Reactor cooling water valve	Sticking
15	Condenser cooling water valve	Sticking
16	Unknown	Unknown
17	Unknown	Unknown
18	Unknown	Unknown
19	Unknown	Unknown
20	Unknown	Unknown
21	Valve position constant (stream 4)	Constant position

TE: Tennessee Eastman.

Fault detection performance comparison

The measured variables are divided into seven sub-blocks for fault detection, and it is necessary to compare the performance indexes calculated from the T² and Q statistics of the global monitoring variables and the distributed monitoring variables to show their fault detection results.

Generally, there are several errors that can cause problems with missing data in industrial large-scale process, including sensor errors, information transmission errors, and multi-rate samples errors. Suppose that two cases with missing data condition, shown as follows, and four methods (PCA, EM-PCA, the MI-based distributed PCA, and the distributed EM-PCA) are used to deal with process monitoring in each case.

Case 1: There are 5% samples (48 samples) lost in the all 960 samples data set, but there is not any sample row with all the data missed even in sub-blocks.

Case 2: There are 10% samples with missing data (96 samples), and each sample with missing data is not missing all the information. However, in some sub-blocks, there could be loss of all the sample row at sampling time.

Monitoring performance can be tested by the evaluation indices, like false alarm rate (FAR), fault detection rate (FDR).³² The indices FAR and FDR are generally used to judge the effect of fault detection results.

The general calculation form of the FAR and FDR of $T^{2}$ statistic can be written as follows

F A R = \frac{n u m b e r o f s a m p l e s (T^{2} > T_{l i m}^{2} | w i t h o u t r e l a t e d f a u l t)}{t o t a l f a u l t - f r e e s a m p l e s} \times 100 %

(43)

F D R = \frac{n u m b e r o f s a m p l e s (T^{2} > T_{l i m}^{2} | w i t h r e l a t e d f a u l t)}{t o t a l f a u l t y s a m p l e s} \times 100 %

(44)

where $T^{2}$ denotes the statistical variable of PCA method, $T_{\lim}^{2}$ denotes the threshold of $T^{2}$ statistic.

Similarly, the FAR and FDR of Q statistic can be calculated by follows

F A R = \frac{n u m b e r o f s a m p l e s (Q > Q_{l i m} | w i t h o u t r e l a t e d f a u l t)}{t o t a l f a u l t - f r e e s a m p l e s} \times 100 %

(45)

F D R = \frac{n u m b e r o f s a m p l e s (Q > Q_{l i m} | w i t h r e l a t e d f a u l t)}{t o t a l f a u l t y s a m p l e s} \times 100 %

(46)

where Q denotes the statistical variable of PCA method, $Q_{\lim}$ denotes the threshold of $T^{2}$ statistic.

In this way, all 21 faults have been considered with FAR and FDR. As the FAR and FDR of traditional PCA, EM-PCA, the MI-based distributed PCA and the distributed EM-PCA for all 21 faults are given in Tables 4 and 5, the performance of distributed monitoring for large-scale process is illustrated.

Table 4.

FAR of PCA, EM-PCA, MI-based distributed PCA and distributed EM-PCA (case 1/case 2).

Fault no.	PCA		EM-PCA		MI-based DPCA		DEM-PCA
	T ²	Q	T ²	Q	$BI C_{T^{2}}$	BIC_Q	$BI C_{T^{2}}$	BIC_Q
1	0.06/0.06	0.04/0.04	0.06/0.06	0.04/0.03	0.02/0.01	0.02/0.01	0.03/0.01	0.03/0.05
2	0.03/0.03	0.08/0.06	0.03/0.03	0.08/0.07	0.07/0.04	0.07/0.05	0.04/0.02	0.08/0.11
3	0.01/0.01	0.14/0.14	0.01/0.01	0.14/0.14	0.19/0.09	0.21/0.08	0.10/0.09	0.09/0.11
4	0.03/0.03	0.04/0.05	0.03/0.03	0.04/0.05	0.07/0.03	0.10/0.14	0.03/0.03	0.09/0.08
5	0.03/0.03	0.04/0.05	0.03/0.03	0.04/0.06	0.07/0.03	0.10/0.06	0.03/0.03	0.11/0.06
6	0.01/0.01	0.03/0.03	0.01/0.01	0.04/0.03	0.07/0.03	0.15/0.06	0.04/0.03	0.06/0.01
7	0.02/0.02	0.01/0.02	0.02/0.02	0.01/0.02	0.04/0.04	0.07/0.06	0.03/0.03	0.07/0.03
8	0.04/0.04	0.03/0.03	0.04/0.04	0.03/0.03	0.06/0.05	0.06/0.06	0.08/0.08	0.03/0.03
9	0.13/0.13	0.06/0.05	0.13/0.13	0.06/0.06	0.20/0.26	0.17/0.21	0.08/0.04	0.10/0.08
10	0.02/0.01	0.08/0.08	0.02/0.01	0.08/0.08	0.02/0.02	0.05/0.05	0.07/0.01	0.06/0.09
11	0.06/0.06	0.08/0.07	0.06/0.06	0.08/0.07	0.04/0.04	0.08/0.07	0.01/0.03	0.09/0.03
12	0.03/0.04	0.07/0.07	0.03/0.04	0.07/0.07	0.11/0.10	0.11/0.09	0.01/0.03	0.04/0.03
13	0.01/0.01	0.03/0.03	0.01/0.01	0.03/0.03	0.01/0.01	0.06/0.06	0.02/0.01	0.03/0.02
14	0.03/0.02	0.11/0.10	0.03/0.03	0.11/0.10	0.02/0.02	0.04/0.04	0.08/0.03	0.04/0.06
15	0.02/0.01	0.07/0.06	0.02/0.01	0.07/0.07	0.03/0.01	0.04/0.05	0.01/0.03	0.01/0.04
16	0.21/0.18	0.09/0.09	0.21/0.20	0.09/0.09	0.20/0.32	0.17/0.14	0.03/0.08	0.10/0.06
17	0.02/0.02	0.08/0.08	0.02/0.02	0.08/0.09	0.13/0.13	0.06/0.06	0.06/0.01	0.04/0.02
18	0.04/0.04	0.06/0.06	0.04/0.04	0.06/0.06	0.03/0.03	0.03/0.04	0.04/0.04	0.06/0.07
19	0.02/0.02	0.04/0.04	0.02/0.03	0.04/0.04	0.06/0.04	0.03/0.02	0.05/0.04	0.02/0.04
20	0.01/0.01	0.04/0.04	0.01/0.01	0.04/0.04	0.01/0.01	0.02/0.01	0.01/0.03	0.08/0.04
21	0.07/0.05	0.14/0.12	0.07/0.06	0.14/0.14	0.12/0.11	0.16/0.15	0.09/0.09	0.14/0.11

FAR: false alarm rate; PCA: principal component analysis; DPCA: distributed PCA; EM-PCA: expectation maximization-principal component analysis; MI: mutual information; DEM-PCA: distributed expectation maximization-principal component analysis; BIC: Bayesian inference comprehensive.

Table 5.

FDR of PCA, EM-PCA, MI-based distributed PCA and distributed EM-PCA (case 1/case 2).

Fault no.	PCA		EM-PCA		MI-based DPCA		DEM-PCA
	T ²	Q	T ²	Q	$BI C_{T^{2}}$	BIC_Q	$BI C_{T^{2}}$	BIC_Q
1	0.95/0.90	0.95/0.90	0.99/1.00	1.00/1.00	0.95/0.90	0.95/0.90	1.00/0.99	1.00/1.00
2	0.95/0.88	0.95/0.90	0.99/0.99	1.00/1.00	0.94/0.89	0.93/0.88	0.99/0.99	0.99/0.99
3	0.10/0.09	0.41/0.38	0.10/0.10	0.41/0.41	0.10/0.09	0.41/0.40	0.21/0.11	0.42/0.38
4	0.65/0.39	0.95/0.90	0.68/0.68	1.00/1.00	0.85/0.81	0.95/0.90	0.97/0.94	1.00/0.99
5	0.31/0.29	0.62/0.57	0.32/0.32	0.64/0.64	0.48/0.42	0.95/0.90	0.52/0.54	0.99/0.99
6	0.95/0.89	0.95/0.90	0.99/0.99	1.00/1.00	0.95/0.90	0.95/0.90	1.00/1.00	1.00/1.00
7	0.95/0.90	0.95/0.90	1.00/1.00	1.00/1.00	0.95/0.90	0.90/0.85	1.00/1.00	1.00/1.00
8	0.93/0.88	0.94/0.89	0.98/0.98	0.99/0.99	0.94/0.89	0.90/0.86	1.00/0.99	0.99/1.00
9	0.11/0.10	0.39/0.37	0.12/0.11	0.39/0.39	0.20/0.19	0.35/0.34	0.27/0.22	0.38/0.39
10	0.51/0.49	0.81/0.77	0.53/0.53	0.84/0.85	0.60/0.58	0.87/0.82	0.72/0.72	0.97/0.97
11	0.60/0.56	0.83/0.78	0.62/0.61	0.87/0.86	0.66/0.61	0.81/0.76	0.84/0.84	0.93/0.94
12	0.94/0.89	0.95/0.90	0.99/0.99	1.00/1.00	0.95/0.90	0.95/0.90	1.00/1.00	1.00/1.00
13	0.91/0.86	0.92/0.87	0.96/0.96	0.97/0.97	0.95/0.87	0.95/0.87	0.96/0.97	0.98/0.98
14	0.95/0.90	0.95/0.90	1.00/0.98	1.00/1.00	0.95/0.90	0.94/0.89	1.00/1.00	1.00/1.00
15	0.15/0.13	0.41/0.39	0.15/0.14	0.42/0.41	0.20/0.19	0.31/0.30	0.17/0.23	0.31/0.35
16	0.35/0.34	0.76/0.73	0.38/0.38	0.80/0.80	0.70/0.66	0.90/0.85	0.78/0.47	1.00/0.88
17	0.80/0.76	0.94/0.89	0.85/0.85	0.98/0.98	0.88/0.83	0.93/0.89	0.92/0.95	0.99/0.98
18	0.86/0.81	0.91/0.86	0.90/0.90	0.95/0.96	0.90/0.85	0.89/0.84	0.94/0.95	0.95/0.96
19	0.18/0.17	0.71/0.68	0.19/0.18	0.73/0.73	0.18/0.17	0.28/0.25	0.23/0.19	0.43/0.65
20	0.49/0.46	0.79/0.76	0.51/0.51	0.83/0.83	0.79/0.75	0.79/0.75	0.81/0.78	0.89/0.90
21	0.42/0.40	0.73/0.69	0.44/0.44	0.75/0.76	0.49/0.58	0.74/0.71	0.59/0.62	0.86/0.88

FDR: fault detection rate; PCA: principal component analysis; DPCA: distributed PCA; EM-PCA: expectation maximization principal component analysis; MI: mutual information; DEM-PCA: distributed expectation maximization principal component analysis; BIC: Bayesian inference comprehensive.

Most FARs of each fault in two cases are not more than 10% except several faults are hard to monitor, however, most faults that used MI-based PCA monitoring method have higher FAR index than distributed EM-PCA method. It shows that proposed distributed EM-PCA method can achieve fault detection effectively through Table 4. Moreover, the comparison of FDR indicates that the proposed DEM-PCA method performs more effectively than other three monitoring methods in most of faulty condition, especially in faults 4, 5, 10, 11, and 21. Although it is not evident that the proposed method has better FDR performance in some faults, this method has less computational consumption than global PCA and EM-PCA methods, as well as has monitoring results more accurately with an incomplete data set. In other words, the sample data complexity of distributed EM-PCA monitoring strategy is decreased and the computational efficiency of that is also improved.

Distributed EM-PCA for monitoring process with missing data

As can be seen in Table 5, faults 4, 5, 10, and 21 are selected to give their fault detection results in details. In addition, we divided two parts of them into different cases, fault 4 and fault 5 are detected in case 1, fault 10 and fault 21 are detected in case 2. In this way, both cases can be handled by different simulations. It is worth mentioning that the “*” with red color in all monitoring results shown in Figures 3 –6 denotes the estimated value of missing data.

Figure 3.

Monitoring results for fault 4 in case 1: (a) EM-PCA and (b) distributed EM-PCA.

Figure 4.

Monitoring results for fault 5 in case 1: (a) EM-PCA and (b) distributed EM-PCA.

Figure 5.

Monitoring results for fault 10 in case 2: (a) EM-PCA and (b) distributed EM-PCA.

Figure 6.

Monitoring results for fault 21 in case 2: (a) EM-PCA and (b) distributed EM-PCA.

Fault 4 is caused by a step change that occurs in the reactor cooling water inlet temperature; and the monitoring performances with EM-PCA and proposed distributed EM-PCA are shown in Figure 3. Obviously, the T² statistic of EM-PCA has poorer performance than distributed EM-PCA, its FDR index is low. In other words, proposed distributed EM-PCA method can detect fault 4 effectively and quickly by T² statistic. Meanwhile, the Q statistic of distributed EM-PCA has a result as good as EM-PCA monitoring results.

Fault 5 is a step change of condenser cooling water inlet temperature, Figure 4 indicates the monitoring results for fault 5 by using traditional EM-PCA and the proposed distributed EM-PCA method to achieve fault detection. When the fault 5 occurred, both methods can find fault at 161th sample, but only the proposed distributed EM-PCA method can detect this fault after 400th sample by Q statistic. Thus, the FDR of proposed method is much higher than traditional EM-PCA; and the T² statistic of this method is almost the same as traditional EM-PCA. So the proposed ditributed EM-PCA method can deal with fault detection with missing data problem effectively, as well as making improvement in process monitoring performance.

Fault 10 is random variation of C feed temperature (stream 4), and Figure 5 shows the monitoring results in case 2 by traditional EM-PCA and proposed distributed EM-PCA methods. The BIC Q statistic, constructed by proposed distributed EM-PCA, can almost detect this fault successfully with 10% missing data. It is evident that the monitoring performance of proposed DEM-PCA is better than normal EM-PCA.

Fault 21 is a constant position fault of valve position constant (stream 4) in the TE process. The monitoring performances of two methods in case 2 are shown in Figure 6. The monitoring performances of T² and Q statistic are abnormal at around 600th sample when traditional EM-PCA is used. Meanwhile, proposed DEM-PCA method can detect this fault at around 400th sample. In addition, the FDR index of proposed DEM-PCA is higher than traditional EM-PCA obviously. Therefore, the monitoring performance of proposed DEM-PCA is better than the other method.

In a word, it is obvious that the distributed EM-PCA method has higher fault detection rates than the traditional EM-PCA method illustrated by Figures 3 –6. Hence, the method of distributed EM-PCA algorithm has been proven to give an effective value complement result, which is closer to the complete data fault detection. Furthermore, the distributed EM-PCA method is more sensitive to faults than the traditional EM-PCA method that can obtain the fault detection results rapidly.

To further illustrate the process condition, according to the results that variables contribution are compared with each other, we can find one or several variables having their contribution larger than others, which should make process condition abnormal possibly. For case 2, the variables contribution of fault 5 are presented in Figure 7 and fault 11 are presented in Figure 8.

Figure 7.

Monitoring results and contribution plots for fault 5: (a) a global PCA, (b) DEM-PCA sub-block 2, and (c) DEM-PCA sub-block 3.

Figure 8.

Monitoring results and contribution plots for fault 11: (a) a global PCA, (b) DEM-PCA sub-block 2, and (c) DEM-PCA sub-block 3.

Figure 7(a) shows 33 variables T² and Q statistic contribution of the global PCA, there are many variables having high contribution in histogram, so it is difficult to select several variables which should be responsible for fault 5. However, distributed EM-PCA has seven sub-blocks with monitoring results and contribution plots separately. Figure 7(b) shows the result of sub-block 2 and Figure 7(c) shows the result of sub-block 3, through which we can find that four variables 2, 7, 8, and 32 have dominant contributions both in the global monitor and sub-block monitors. Therefore, depending on the fault diagnosis method in this paper, the variables 2, 7, 8, and 32 are responsible for the fault 5 that was identified successfully in the sub-block models.

Similarly, T² and Q statistic contribution of the global PCA for fault 11 in Figure 8(a) cannot acquire a simple fault diagnosis result; and it is evident that the variables 6, 9, and 32 should be responsible for the fault 11, which is indicated by the sub-block 2 contribution plots in Figure 8(b) as well as the sub-block 3 contribution plots in Figure 8(c).

Conclusion

In this paper, a distributed EM-PCA scheme is proposed for plant-wide industrial processes with missing data. where the large-scale process decomposition combined the knowledge-based method with GDC-based method. Then EM-PCA algorithm was applied in all sub-blocks for estimating missing values, as well as the fault detection results of each sub-block were obtained by this method. Bayesian inference was utilized to fuse the monitoring results in each sub-block and gave a final decision. Furthermore, fault diagnosis results were acquired by calculating variables contributions of several sub-blocks. It is worth noting that our proposed approach can improve the performance of fault detection and reduce the size of monitoring data set, and several case studies on TE benchmark have validated its effectiveness and performance.

Footnotes

Appendix 1 Acknowledgements

The authors would like to thank the National Natural Science Foundation of China, the Fundamental Research Funds for the Central Universities, and the National Key R&D Program of China.

Handling Editor: Choon Ki Ahn

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the National Natural Science Foundation of China under grant #61673053 and #61903026, the Fundamental Research Funds for the Central Universities under Grant #FRF-BD-19-002A, and the National Key R&D Program of China under grant #2017YFB0306403 for funding.

ORCID iD

Xu Yang

References

Ding

. Data-driven design of fault diagnosis and fault-tolerant control systems. London: Springer, 2014.

Zhao

Huang

Shmaliy

. Bayesian state estimation on finite horizons: the case of linear state–space mode. Automatica 2017; 85: 91–99.

Zhu

Song

. Distributed parallel PCA for modeling and monitoring of large-scale processes with big data. IEEE T Ind Inform 2017; 13(4): 1877–1885.

Lie

. Analytic hierarchy process based fuzzy decision fusion system for model prioritization and process monitoring application. IEEE T Ind Inform 2018: 357–365.

Wang

Jiang

. Data-driven optimized distributed dynamic PCA for efficient monitoring of large-scale dynamic processes. IEEE Access 2017; 5: 18325–18333.

Zhang

Gonzalez

Huang

, et al. Expectation–maximization approach to fault diagnosis with missing data. IEEE T Ind Electron 2015; 62(2): 1231–1240.

Jiang

Huang

. Distributed monitoring for large-scale processes based on multivariate statistical analysis and Bayesian method. J Process Contr 2016; 46: 75–83.

De Angelo

Bossio

Giaccone

, et al. Online model-based stator-fault detection and identification in induction motors. IEEE T Ind Electron 2009; 56(11): 4671–4680.

Huang

Tan

Lee

. Fault diagnosis and fault-tolerant control in linear drives using the Kalman filter. IEEE T Ind Electron 2012; 59(11): 4285–4292.

10.

Wang

. Fault diagnosis for nonlinear systems via neural networks and parameter estimation. In: Proceedings of 2005 international conference on control and automation, Budapest, 26–29 June 2005, vol. 1, pp.559–563. New York: IEEE.

11.

Alrowaie

Gopaluni

Kwok

. Fault detection and isolation in stochastic non-linear state-space models using particle filters. Control Eng Pract 2012; 20(10): 1016–1032.

12.

Zhao

Huang

. Probabilistic monitoring of sensors in state-space with variational Bayesian inference. IEEE T Ind Electron 2018; 66: 2154–2163.

13.

Zhao

Ding

Karimi

, et al. On robust Kalman filter for two-dimensional uncertain linear discrete time-varying systems: a least squares method. Automatica 2019; 99: 203–212.

14.

Qin

Zhou

. Geometric properties of partial least squares for process monitoring. Automatica 2010; 46(1): 204–210.

15.

Kourti

Macgregor

. Multivariate SPC methods for process and product monitoring. J Qual Technol 1996; 28(4): 409–428.

16.

Liu

Qin

Chai

. Multiblock concurrent PLS for decentralized monitoring of continuous annealing processes. IEEE T Ind Electron 2014; 61(11): 6429–6437.

17.

Zhang

Zhou

Qin

, et al. Decentralized fault diagnosis of large-scale processes using multiblock kernel partial least squares. IEEE T Ind Inform 2010; 6(1): 3–10.

18.

Downs

Vogel

. A plant-wide industrial process control problem. Comput Chem Eng 1993; 17(3): 245–255.

19.

Jiang

Yan

Huang

. Performance-driven distributed PCA process monitoring based on fault-relevant variable selection and Bayesian inference. IEEE T Ind Electron 2015; 63(1): 377–386.

20.

Zhu

Song

, et al. Large-scale plant-wide process modeling and hierarchical monitoring: a distributed Bayesian network approach. J Process Contr 2018; 65: 91–106.

21.

Chen

. Plant-wide industrial process monitoring: a distributed modeling framework. IEEE T Ind Inform 2017; 12(1): 310–321.

22.

Mcevoy

Wolthusen

. A plant-wide industrial process control security problem. In: Butts

Shenoi

(eds) Critical infrastructure protection V. Berlin: Springer, 2011, pp.47–56.

23.

Jiang

Yan

. Large-scale process monitoring based on mutual information–multiblock principal component analysis. ISA T 2014; 53(5): 1516–1527.

24.

Jiang

Yan

. Nonlinear plant-wide process monitoring using MI-spectral clustering and Bayesian inference-based multiblock KPCA. J Process Contr 2015; 32: 38–50.

25.

Wang

Yan

Jiang

, et al. Generalized Dice’s coefficient-based multi-block principal component analysis with Bayesian inference for plant-wide process monitoring. J Chemometr 2015; 29(3): 165–178.

26.

Muteki

Macgregor

Ueda

. Estimation of missing data using latent variable methods with auxiliary information. Chemometr Intell Lab 2005; 78(1–2): 41–50.

27.

Newman

. Missing data: five practical guidelines. Organ Res Methods 2014; 17(4): 372–411.

28.

Dempster

. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 1997; 39(1): 1–38.

29.

Roweis

. EM algorithms for PCA and SPCA. Adv Neur In 1997; 10: 626–632.

30.

Zhou

Zhan

, et al. A study of wind turbine comprehensive operational assessment model based on EM-PCA algorithm. IOP C Ser Earth Env 2018; 108: 052072.

31.

Sun

Liu

. Imputation of random missing data in chemical engineering process with EM-PCA. Comput Appl Chem 2013; 30(7): 735–738.

32.

Zhang

Song

. Nonlinear process monitoring based on linear subspace and Bayesian inference. J Process Contr 2010; 20(5): 676–688.