Process monitoring based on distributed principal component analysis with angle-relevant variable selection

Abstract

Multivariate statistics process monitoring can achieve dimensionality reduction and latent feature extraction on process variables. However, process variables without beneficial information may affect the monitoring performance. This article proposes a distributed principal component analysis method based on the angle-relevant variable selection for plant-wide process monitoring. The directions of principal components are utilized to construct the sub-blocks, where the variables in each sub-block are determined by angle. After establishing the principal component analysis model in each sub-block, the monitoring results are fused by Bayesian inference. The simulation results show that the proposed method can select the responsible variables effectively and enhance the monitoring performance.

Keywords

Distributed monitoring principal component analysis angle-relevant variable selection Bayesian inference

Introduction

The monitoring and diagnosis of a chemical process is crucially important to ensure the safety and the quality of the product.^1–7 With the rapid development of the modern industries, a large amount of data emerges and a new challenge to multivariate statistics process monitoring (MSPM) is given. In the area of MSPM, some approaches have been reported.^8–12 Among these methods, principal component analysis (PCA) is the most basic and widely used method.^13,14 By projecting the data into two low-dimensional spaces (the principal component space (PCS) and the residual space (RS)), the high-dimensional and correlative data can be effectively operated. However, the traditional PCA methods cannot serve well for complex plant-wide process monitoring.^15,16

In order to better monitor the plant-wide process, multi-block monitoring methods or distributed monitoring methods have drawn much attention. The measured variables are generally large in a plant-wide process and the correlations among variables are complex.^17–19 To deal with these problems, different methods have been developed.^20–22 For example, a monitoring and diagnosis of charts is developed to enhance the performance in each sub-block.²⁰ Several multi-block PCA and partial least squares (PLS) algorithms are provided in Westerhuis et al.²¹ and interpreted with a unified notation. With the same motivation, a particular analysis of multi-block PCA and PLS algorithms is supplied in Qin et al.²²

In distributed monitoring, the block division is a premier step. Usually, the process knowledge is assumed to be known in the block division step, which is not always available in practice. In this case, the data-based multi-block methods that conduct the block division automatically are of significant interests.^23,24 Due to the limitation of the cognition or experience, a distributed PCA (DPCA) is first proposed for plant-wide process monitoring in Ge and Song.²³ This method can automatically divide the original data into several sub-spaces/sub-blocks by different directions of principal components (PC). Therefore, the local behavior of the process can be reflected by these sub-blocks and the monitoring performance can be improved by the fusion of individual sub-blocks results. Subsequently, several methods are proposed for block division in a plant-wide process.^25–28 For example, a multi-block monitoring method is proposed by adopting the mutual information (MI) in Jiang and Yan²⁵ and Xu et al.,²⁶ a modified multi-block PCA algorithm is developed for extracting block scores in Tong and Yan,²⁷ and a dynamic decentralized PCA method is introduced for modeling and monitoring dynamic processes in Tong et al.²⁸ Recently, a parallel PCA-KPCA method and performance-driven method are proposed to solve the nonlinear distributed monitoring problems.^29,30 However, in the DPCA method, the variables in the constructed residual sub-space/sub-block cannot be accurately selected by using the mean of the residual loading matrix, even if the contribution of variables to the PC can be indicated by each element of loading vector.

As one of the effective tools of statistical theory, Bayesian inference has been successfully applied to distributed monitoring.^{23,25,26,31,32} In order to fuse the monitoring results of all sub-blocks after obtaining the monitoring model in each sub-block, the DPCA methods utilize the Bayesian inference strategy to combine all sub-blocks results.^23,31 Furthermore, this strategy has been widely applied to the fusion stage, such as the MI-based distributed monitoring methods.^25,26 Fault diagnosis is another key step for distributed monitoring.^18,26,32,33 As a widely used fault diagnosis method, the contribution plot method received attention.^18,26 Recently, the Bayesian fault diagnosis is proposed to provide timely diagnosis result of the whole process.³³ Through the above analysis, how to effectively integrate the results of each sub-block and diagnose the fault is critical.

This article proposes a DPCA method based on the angle-relevant variable selection to monitor plant-wide process, which is named as ABPCA. In the proposed ABPCA, the sub-blocks are obtained through different PCs directions, which are uncorrelated with each other. Considering the influence of variables, the subset in each sub-block is selected by the angle between the variables and the corresponding PC. To overcome the limitation of DPCA, the angle between the variables and the residual sub-space/sub-block is computed to determine the variables in residual sub-space/sub-block. After that, the PCA model is established in each sub-block and the final monitoring results of all sub-blocks are fused by the Bayesian inference. Finally, a modified contribution plot method is constructed to diagnose the responsible variables after the fault is detected.

The rest of the article is organized as follows. The PCA monitoring method is briefly introduced, and then the proposed ABPCA method is presented. Next, the performance of the ABPCA method is validated by a numerical simulation and the Tennessee Eastman (TE) benchmark process. Finally, the conclusions are drawn.

A description of a PCA-based process monitoring

Considering a dataset $X \in R^{n \times m}$ , which includes $m$ variables and $n$ samples, the matrix $X$ can be explained by^2,34

X = \hat{T} {\hat{P}}^{T} + \bar{T} {\bar{P}}^{T} = \hat{T} {\hat{P}}^{T} + E

(1)

where $\hat{P} \in R^{m \times k}$ is the loading matrices to PCS and $\bar{P} \in R^{m \times (m - k)}$ is the loading matrices to RS, and $\hat{T} \in R^{n \times k}$ and $\bar{T} \in R^{n \times (m - k)}$ represent score matrices to PCS and RS, respectively. $k$ is the number of PC, which is determined by the cumulative percentage variance (CPV) rule. $E \in R^{n \times m}$ is the residual matrix.

For an online sample $x$ , the traditional monitoring statistics $T^{2}$ and $Q$ can be obtained

T^{2} = x^{T} \hat{P} Λ^{- 1} {\hat{P}}^{T} x \leq T_{\lim}^{2}

(2)

Q = x^{T} (I - \hat{P} {\hat{P}}^{T}) x \leq Q_{\lim}

(3)

where $Λ = diag (λ_{1}, λ_{2}, \dots, λ_{k})$ , $λ_{i}$ is the ith eigenvalue of $Δ$ $(Δ = (X^{T} X) / (n - 1))$ , and $T_{\lim}^{2}$ and $Q_{\lim}$ are the confidence limits of the statistics.

Method

Sub-block division and angle-relevant variable selection

Because a large amount of data is collected in a plant-wide process, the correlations among process variables are complex. The multi-block scheme can solve this problem by dividing the variables into different sub-blocks, and each sub-block contains the relevant variables. In ABPCA, the block divisions are implemented through an automatic way in PC directions and a variable selection method based on the angle is used in each sub-block.^35,36

Assume that the process data are $X$ . Using PCA decomposition for $X$ gives

X = \hat{T} {\hat{P}}^{T} + \bar{T} {\bar{P}}^{T}

(4)

The loading matrices in PCS and RS can be obtained as

\hat{P} = [{\hat{P}}_{1}, {\hat{P}}_{2}, \dots, {\hat{P}}_{k}]

(5)

\bar{P} = [{\bar{P}}_{1}, {\bar{P}}_{2}, \dots, {\bar{P}}_{m - k}]

(6)

The items in equations (5) and (6) are orthogonal with each other. For accuracy, the diversity is required in the block division step. Hence, the diversity between the sub-blocks can be fulfilled when the individual sub-block models are built in these uncorrelated directions. On the contrary, if the most relevant variables are selected in each sub-block, the most significant information can be preserved. Therefore, $k$ distributed sub-blocks can be constructed in each PC direction. Since the residual part and the PCs are uncorrelated, additional sub-block needs to be built in the RS. The sub-blocks are then obtained as

subblock {b, b = 1, 2, \dots, k} = \hat{P} {b, b = 1, 2, \dots, k}

(7)

subblock {b, b = k + 1} = \bar{P} {1, 2, \dots, m - k}

(8)

After constructing the sub-blocks, the variables in each sub-block need to be determined. In this article, the angle is used to quantify the contribution between each variable and each sub-block. The smaller the angle between variable and sub-block, the greater the relevance of variable with the sub-block. For each sub-block, two different types of contribution indices are defined as the criteria to select the process variables.

In PCS, the contribution of the variable $i$ for sub-block $b$ can be calculated by

θ_{i}^{PCS} (i, b) = θ (X_{i}, {\hat{T}}_{b})

(9)

where $b = 1, 2, \dots, k$ , $i = 1, 2, \dots, m$ , $X_{i} \in R^{n \times 1}$ is the ith column of $X$ , and ${\hat{T}}_{b}$ is the bth column element of the score matrix $\hat{T}$ . The contribution of the variable $i$ for sub-block constructed in RS can be calculated by^37,38

θ_{i}^{RS} (i, k + 1) = θ (X_{i}, {\bar{T}}_{k + 1})

(10)

Based on vector $X_{i}$ and matrix $\bar{T}$ , the angle between the vector and the subspace extended by the matrix can be calculated. Making the $QR$ decomposition of $X_{i}$ and $\bar{T}$ , the orthonormal basis can be obtained accordingly

\bar{T} = Q_{\bar{T}} R_{\bar{T}}

(11)

X_{i} = Q_{i} R_{i}

(12)

The singular value decomposition (SVD) of $Q_{\bar{T}}^{T} Q_{i}$ is

Q_{\bar{T}}^{T} Q_{i} = U_{i} Σ_{i} V_{i}^{T}

(13)

The angle between the vector and the subspace extended by the matrix can be obtained by

θ_{i}^{RS} = \arccos (U_{i} Q_{\bar{T}}^{T} Q_{i} V_{i}), 0 ° \leq θ_{i}^{RS} \leq 90 °

(14)

Thus, according to the contribution indices given by the equations (9) and (10), the variables with high contribution values will be contained in the corresponding sub-blocks.

Fault detection in ABPCA

While the selection of variables in each sub-block with respect to contribution indices, the PCA models of individual sub-blocks can be developed as

X_{b} = {\hat{T}}_{b} {\hat{P}}_{b}^{T} + E_{b}

(15)

where $b = 1, 2, \dots, k + 1$ . The number of PCs in each sub-block is determined according to the CPV rule. Then, the statistics $T^{2}$ , $Q$ , and the confidence limits can be obtained in each sub-block.

In order to provide an intuitive indication of the process status, the monitoring results of all sub-blocks must be combined properly. In view of the efficiency in statistics fusion, Bayesian inference strategy is used here.^16,39 In Bayesian inference, the fault probability with respect to the $T^{2}$ statistic in each sub-block can be computed by

P_{T^{2}} (F | x_{b}) = \frac{P_{T^{2}} (x_{b} | F) P_{T^{2}} (F)}{P_{T^{2}} (x_{b})}

(16)

P_{T^{2}} (x_{b}) = P_{T^{2}} (x_{b} | N) P_{T^{2}} (N) + P_{T^{2}} (x_{b} | F) P_{T^{2}} (F)

(17)

where $P_{T^{2}} (x_{b} | N)$ and $P_{T^{2}} (x_{b} | F)$ are given by¹⁶

P_{T^{2}} (x_{b} | N) = e^{\frac{- T_{b}^{2} (x_{b})}{T_{b, \lim}^{2}}}, P_{T^{2}} (x_{b} | F) = e^{\frac{T_{b, \lim}^{2}}{- T_{b}^{2} (x_{b})}}

(18)

where “ $F$ ” is for faulty condition, “ $N$ ” is the normal condition, $P_{T^{2}} (F)$ is the confidence level $α$ , while $P_{T^{2}} (N)$ is $1 - α$ , $T_{b}^{2} (x_{b})$ and $T_{b, \lim}^{2}$ are the $T^{2}$ statistic and the confidence limit in bth sub-block, respectively.

Similarly, the fault probability with respect to the $Q$ statistic in each sub-block can be calculated by

P_{Q} (F | x_{b}) = \frac{P_{Q} (x_{b} | F) P_{Q} (F)}{P_{Q} (x_{b})}

(19)

P_{Q} (x_{b}) = P_{Q} (x_{b} | N) P_{Q} (N) + P_{Q} (x_{b} | F) P_{Q} (F)

(20)

where the conditional probabilities $P_{Q} (x_{b} | N)$ and $P_{Q} (x_{b} | F)$ are defined as

P_{Q} (x_{b} | N) = e^{\frac{- Q_{b} (x_{b})}{Q_{b, \lim}}}, P_{Q} (x_{b} | F) = e^{\frac{Q_{b, \lim}}{- Q_{b} (x_{b})}}

(21)

Then, the final probabilistic statistics can be obtained as

BI C_{T^{2}} = \sum_{b = 1}^{k + 1} {\frac{P_{T^{2}} (x_{b} | F) P_{T^{2}} (F | x_{b})}{\sum_{j = 1}^{k + 1} P_{T^{2}} (x_{b} | F)}}

(22)

BI C_{Q} = \sum_{b = 1}^{k + 1} {\frac{P_{Q} (x_{b} | F) P_{Q} (F | x_{b})}{\sum_{j = 1}^{k + 1} P_{Q} (x_{b} | F)}}

(23)

When the value of $BI C_{T^{2}}$ or $BI C_{Q}$ is greater than the significance level $α$ , it is considered that a fault occurs.

Fault diagnosis in ABPCA

After detecting the fault, it is necessary to determine the root cause of it. The contribution plot method is widely used in PCA-based monitoring for identifying the responsible variables of fault.^18,26 Below, the contribution plot for the ABPCA is introduced.

For the bth block $x_{b} = [x_{b}^{1} x_{b}^{2} \dots x_{b}^{m_{b}}]$ , the total contribution of jth process variable in bth sub-block is

{CT}_{b}^{j} = \sum_{i = 1}^{k_{b}} ({ct}_{b}^{i, j})

(24)

where $k_{b}$ is the number of PCs in the bth sub-block and ${ct}_{b}^{i, j}$ is the contribution of variable $x_{b}^{j}$ to the ith score $t_{b}^{i}$ , specified as

{ct}_{b}^{i, j} = \frac{t_{b}^{i}}{λ_{b}^{i}} p_{b}^{i, j} (x_{b}^{j})

(25)

where $p_{b}^{i, j}$ is the $(i, j) th$ element of the loading matrix ${\hat{p}}_{b}$ in the bth sub-block, $λ_{b}^{i}$ is the $(i, j) th$ element in the diagonal matrix $Λ_{b}$ , and $m_{b}$ is the selected variables number. Therefore, the contribution of variables to the $T^{2}$ statistics can be computed by using the following probabilistic weighting rule

C T^{j} = \sum_{b = 1}^{k + 1} {\frac{P_{T^{2}} (x_{b} | F)}{\sum_{j = 1}^{k + 1} P_{T^{2}} (x_{j} | F)} {CT}_{b}^{j}}

(26)

Similarly, the contribution of the variables to the $Q$ statistics can be calculated by

C Q^{j} = \sum_{b = 1}^{k + 1} {\frac{P_{Q} (x_{b} | F)}{\sum_{j = 1}^{k + 1} P_{Q} (x_{j} | F)} {CQ}_{b}^{j}}

(27)

where

{CQ}_{b}^{j} = {(x_{b}^{j} - {\hat{x}}_{b}^{j})}^{2} = (x_{b}^{j} - {({\hat{P}}_{b} {\hat{P}}_{b}^{T} x_{b})}^{2})

(28)

In order to obtain more reliable results, in this article, the mean value of the contribution of variables during a certain period is usually considered.

Implementation

The flowchart of ABPCA strategy is presented in Figure 1. The implementation procedures can be summarized as follows:

Collect training data set $X$ under normal process condition.

Construct sub-blocks based on PCA decomposition using equations (4)–(8); select the variables for each sub-block with respect to angles according to equations (9) and (10).

Establish the PCA monitoring model for each sub-block.

For each monitored sample, calculate its statistics of all sub-blocks.

Combine these statistics by Bayesian inference using equations (16)–(23).

Detect the abnormal condition when $BI C_{T^{2}} > α$ or $BI C_{Q} > α$ .

If a fault is detected, the root cause of the detected fault is determined by using equations (26) and (27).

Figure 1.

Flowchart of the ABPCA strategy.

Simulation

Numerical example

Consider a simple example with five Gaussian distributed variables, as shown below in equation (29)

[\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \end{matrix}] = [\begin{matrix} 0.90 & 0.88 & 0.52 & 0 & 0 \\ 0.30 & 0.16 & 0.83 & 0 & 0 \\ 0.59 & 0.55 & 0.39 & 0 & 0 \\ 0 & 0 & 0 & 0.28 & 0.13 \\ 0 & 0 & 0 & 0.81 & 0.72 \end{matrix}] [\begin{matrix} t_{1} \\ t_{2} \\ t_{3} \\ t_{4} \\ t_{5} \end{matrix}] + noise

(29)

where $[t_{1}, t_{2}, t_{3}, t_{4}, t_{5}]^{T}$ is signal sources and the process noise is zero mean Gaussian noise with a standard deviation of $0.01$ . To construct the model, $1000$ samples are generated under normal condition as training data. When building the PCA model, two PCs are chosen by using the CPV rule in this example. Therefore, three sub-blocks are constructed. In this example, two variables are selected in each sub-block according to the angle. Three PCA sub-models are developed accordingly.

Two fault datasets are generated to test the proposed method which consists of 500 samples.

Case 1: From sample 151 to sample 500, a step change of 1 is added to $x_{3}$ .

Case 2: From sample 151 to sample 500, a step change of 3 is added to $t_{5}$ .

The monitoring results of the traditional PCA and ABPCA are shown in Figures 2 –7. From Figure 2, we can see that the monitoring performance of the proposed ABPCA method is better than the traditional PCA method for case 1. Specifically, the monitoring performance of PCA is extremely poor in Figure 2(a), while a fault is detected from the 151 point in Figure 2(b) by using the ABPCA. In order to demonstrate the detailed monitoring performance, the monitoring results in sub-block 1–3 are presented in Figure 3(a)–(c). Based on the angle-relevant variable selection, the variables $x_{1}$ and $x_{3}$ are divided into sub-block 1 while the second sub-block consists of the variables $x_{4}$ and $x_{5}$ and the third sub-block includes the variables $x_{1}$ and $x_{2}$ . It is found that the fault is mostly detected in the sub-block 1, because the variables $x_{1}$ and $x_{3}$ are included in the first sub-block. Thus, when there is a fault in the variable $x_{3}$ , the first sub-block can successfully detect it. The results show that it is necessary for the optimal variables to be chosen and clustered in the same sub-block. The contribution plots of ABPCA at the 201–205 points are presented in Figure 4(a) and (b). The variables $x_{1}$ and $x_{3}$ are identified to be the responsible variables for fault 1.

Figure 2.

Monitoring results of case 1: (a) PCA and (b) ABPCA.

Figure 3.

Monitoring results in sub-blocks: (a) sub-block 1, (b) sub-block 2, and (c) sub-block 3.

Figure 4.

Contribution plots of ABPCA: (a) $T^{2}$ and (b) Q.

Figure 5.

Monitoring results of case 2: (a) PCA and (b) ABPCA.

Figure 6.

Monitoring results in sub-blocks: (a) sub-block 1, (b) sub-block 2, and (c) sub-block 3.

Figure 7.

Contribution plots of ABPCA: (a) $T^{2}$ and (b) Q.

The monitoring results for case 2 are shown in Figures 5 –7. It is obvious that the performance of ABPCA is better than that of the PCA. Figure 6(a)–(c) shows the monitoring results in each sub-block and Figure 7(a)–(b) indicates the contribution plots of ABPCA. It is shown that the fault is detected in sub-block 2 which includes the variables $x_{4}$ and $x_{5}$ . The contribution plot method also confirms the variables $x_{4}$ and $x_{5}$ as the responsible variables.

TE benchmark process

The TE benchmark process is utilized to validate the performance of the proposed method, which has been widely used for various monitoring approaches.⁴⁰ The process is composed of five major unit operations, including a reactor, a condenser, a compressor, a separator, and a stripper. Figure 8 shows the control structure. The process has been well interpreted by Downs and Vogel.⁴⁰ The test data consist of 960 samples and all faults are introduced to the process at sample 161. For TE benchmark process, in order to develop the offline models, the normal data set with 960 samples and 99% confidence level are applied in this article.

Figure 8.

Control system of the Tennessee Eastman process.

A traditional PCA decomposition is required to construct the ABPCA monitoring model. The number of PCs is selected as 14 for the PCA model according to the CPV rule. Therefore, there are altogether 15 sub-blocks to be built, and six process variables are chosen in each sub-block by using angle selection.

Two process faults are selected to evaluate the performance. The first one is fault 5, which is a step change of condenser cooling water inlet temperature. Figure 9 shows the monitoring results of PCA and ABPCA. From Figure 9(a), we can see that the fault is detected at the beginning, but after 370 points, the process is considered to be normal. However, the slight deviation is in the condenser cooling water flow rate after 370 points. From Figure 9(b), it can be seen that the fault can be completely detected when using the proposed ABPCA method. Figure 10 shows the monitoring results of the $Q$ statistics in each block for the further analysis of the performance of fault detection. From the figure, the statistics in blocks 4, 5, and 6 have a clear indication of the fault after 370 points. Fault diagnosis results are shown in Figure 11(a) and (b). They give the average variable contributions of ABPCA at the 261–265 points for the purpose of finding the root cause of this fault. It is obviously indicated that the variables $x_{10}$ , $x_{19}$ , and $x_{31}$ are responsible for the fault.

Figure 9.

Monitoring results of TE fault 5: (a) PCA and (b) ABPCA.

Figure 10.

Monitoring results in each sub-block.

Figure 11.

Contribution plots of ABPCA: (a) $T^{2}$ and (b) $Q$ .

Similarly, in accordance with PCA and ABPCA methods, the monitoring result of fault 10 is shown in Figure 12. Fault 10 is a random change of the temperature of stream 4 in TE process. Based on the results shown in Figure 12, there has been a great improvement of the monitoring performance through the $BI C_{Q}$ statistic of the ABPCA method. Figure 13 shows the fault diagnosis results of this fault at the 701–705 points, and it is noted that the variables $x_{19}$ , $x_{27}$ , and $x_{31}$ have the greatest responsibility for the fault.

Figure 12.

Monitoring results of TE fault 10: (a) PCA and (b) ABPCA.

Figure 13.

Contribution plots of ABPCA: (a) $T^{2}$ and (b) $Q$ .

For all 21 faults, the monitoring performances of ABPCA and PCA are presented in Table 1. Furthermore, Bayesian PCA in Ge¹⁶ and DPCA in Ge and Song²³ are used for comparison. It can be seen that the proposed ABPCA is the optimal one in most cases (bold font in Table 1). However, in the case of some faults, the proposed method in this article is inferior to the existing methods. The angle is used to measure the correlation between variables and sub-blocks in this article, but the number of variables in each sub-block is uncertain and can be determined by experience. Therefore, choosing the appropriate number of variables in each sub-block is an important step to ensure the monitoring performance.

Table 1.

Comparison results of different methods.

Fault number	PCA		BSPCA		DPCA		ABPCA
Fault number	$T^{2}$	$Q$	$BI C_{T^{2}}$	$BI C_{Q}$	$T^{2}$	$Q$	$BI C_{T^{2}}$	$BI C_{Q}$
1	0.01	0	0.01	0	0.01	0	0	0
2	0.02	0.04	0.02	0.02	0.01	0.02	0.01	0.01
3	0.99	0.97	0.99	0.91	0.93	0.93	0.92	0.91
4	0.80	0	0.99	0.95	0	0	0	0.52
5	0.76	0.80	0.77	0.73	0.70	0	0.80	0
6	0.01	0	0	0	0.01	0	0.01	0
7	0	0	0.62	0.61	0	0.12	0	0
8	0.03	0.16	0.03	0.03	0.02	0.04	0.05	0.02
9	0.98	0.98	0.98	0.92	0.93	0.91	0.98	0.97
10	0.70	0.74	0.66	0.43	0.54	0.47	0.80	0.39
11	0.59	0.25	0.57	0.47	0.28	0.17	0.48	0.62
12	0.02	0.10	0.01	0.03	0.01	0.02	0.02	0.01
13	0.06	0.05	0.06	0.05	0.06	0.04	0.09	0.08
14	0.01	0	0	0	0	0	0	0
15	0.98	0.97	0.97	0.90	0.95	0.92	0.96	0.90
16	0.87	0.73	0.75	0.67	0.70	0.52	0.95	0.18
17	0.24	0.05	0.11	0.03	0.15	0.03	0.18	0.04
18	0.11	0.10	0.11	0.09	0.10	0.09	0.11	0.09
19	0.89	0.88	0.85	0.88	0.75	0.55	0.98	0.50
20	0.68	0.50	0.73	0.51	0.48	0.35	0.76	0.51
21	0.61	0.53	0.61	0.41	0.55	0.50	0.54	0.49

Conclusion

In this article, a DPCA method named as ABPCA method is proposed, where the variables are selected by angle in each sub-block. The monitoring results of all sub-blocks are combined by Bayesian inference to provide a straightforward indication. As the most widely used method in fault diagnosis, the contribution plot method is also adopted. Simulation results validate the efficiency of proposed algorithm.

This study aims to provide a different solution to PCA-based method for plant-wide process monitoring. However, with the coming era of big data, it will bring many challenges to the plant-wide process monitoring. Therefore, employing the big data processing method is an effective way to solve this problem. Similar to Zhu et al.,⁴¹ the proposed method can be extended to monitor large-scale industrial process with big data by using the framework of MapReduce. Due to the uncertainty of big data, obtaining effective information from process data becomes very difficult. Hence, how to extract the effective information from massive process data needs further study.

Footnotes

Handling Editor: Choon Ki Ahn

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (No. 61773183 and No. 61833007) and the national first-class discipline program of Light Industry Technology and Engineering (LITE2018-25).

ORCID iD

Chen Xu

References

Zhao

Huang

Liu

Fault detection and diagnosis of multiple-model systems with mismodeled transition probabilities. IEEE T Ind Electron 2015; 62(8): 5063–5071.

Jiang

Yan

Multimode process monitoring using variational Bayesian inference and canonical correlation analysis. IEEE T Autom Sci Eng. Epub ahead of print 26 February 2019. DOI: 10.1109/TASE.2019.2897477.

Tong

Shi

Decentralized monitoring of dynamic processes based on dynamic feature selection and informative fault pattern dissimilarity. IEEE T Ind Electron 2016; 63(6): 3804–3814.

Song

Han

Zhang

Fault diagnosis method for closed-loop satellite attitude control systems based on a fuzzy parity equation. Int J Distrib Sens N 2018; 14(10): 1–14.

Zhao

Liu

Sensor fault detection and diagnosis in the presence of outliers. Neurocomputing 2019; 349: 156–163.

Zeng

Zhang

Mei

Fault detection in an engine by fusing information from multivibration sensors. Int J Distrib Sens N 2017; 13(7): 1–9.

Zhang

Han

Feng

A novel method for node fault detection based on clustering in industrial wireless sensor networks. Int J Distrib Sens N 2015; 11(7): 230521.

Zhu

Song

Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data. Annu Rev Control 2018; 46: 107–133.

Yin

Gao

et al . Data-based techniques focused on modern industry: an overview. IEEE T Ind Electron 2015; 62(1): 657–667.

10.

Tong

Lan

Distributed partial least squares based residual generation for statistical process monitoring. J Process Contr 2019; 75: 77–85.

11.

Song

Gao

Review of recent research on data-based process monitoring. Ind Eng Chem Res 2013; 52(10): 3534–3562.

12.

Zhao

Sun

Dynamic distributed monitoring strategy for large-scale nonstationary processes subject to frequently varying conditions under closed-loop control. IEEE T Ind Electron 2019; 66(6): 4749–4758.

13.

Bakshi

BR.

Multiscale PCA with applications to multivariate statistical process monitoring. Am Inst Chem Eng J 1998; 44(7): 1596–1610.

14.

Lee

Yoo

Lee

IB.

Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci 2004; 59(1): 223–234.

15.

Lyman

Georgakis

Plant-wide control of the Tennessee Eastman problem. Comput Chem Eng 1995; 19(3): 321–331.

16.

Review on data-driven modeling and monitoring for plant-wide industrial processes. Chemometr Intell Lab 2017; 171: 16–25.

17.

Yao

Distributed parallel deep learning of hierarchical extreme learning machine for multimode quality prediction with big process data. Eng Appl Artif Intel 2019; 81: 450–465.

18.

Jiang

Yan

Huang

Performance-driven distributed PCA process monitoring based on fault-relevant variable selection and Bayesian inference. IEEE T Ind Electron 2015; 63(1): 377–386.

19.

Chen

Plant-wide industrial process monitoring: a distributed modeling framework. IEEE T Ind Inform 2016; 12(1): 310–321.

20.

MacGregor

Jaeckle

Kiparissides

Process monitoring and diagnosis by multiblock PLS methods. Am Inst Chem Eng J 1994; 40(5): 826–838.

21.

Westerhuis

Kourti

MacGregor

JF.

Analysis of multiblock and hierarchical PCA and PLS models. J Chemometr 1998; 12(5): 301–321.

22.

Qin

Valle

Piovoso

MJ.

On unifying multiblock analysis with application to decentralized process monitoring. J Chemometr 2001; 15(9): 715–742.

23.

Song

Distributed PCA model for plant-wide process monitoring. Ind Eng Chem Res 2013; 52(5): 1947–1957.

24.

Tong

Song

Yan

Distributed statistical process monitoring based on four-subspace construction and Bayesian inference. Ind Eng Chem Res 2013; 52(29): 9897–9907.

25.

Jiang

Yan

Plant-wide process monitoring based on mutual information-multiblock principal component analysis. ISA T 2014; 53(5): 1516–1527.

26.

Zhao

Liu

Distributed plant-wide process monitoring based on PCA with minimal redundancy maximal relevance. Chemometr Intell Lab 2017; 169: 53–63.

27.

Tong

Yan

A novel decentralized process monitoring scheme using a modified multiblock PCA algorithm. IEEE T Autom Sci Eng 2017; 14(2): 1129–1138.

28.

Tong

Lan

Shi

Fault detection and diagnosis of dynamic processes using weighted dynamic decentralized PCA approach. Chemometr Intell Lab 2017; 161: 34–42.

29.

Jiang

Yan

Parallel PCA-KPCA for nonlinear process monitoring. Control Eng Pract 2018; 80: 17–25.

30.

Jiang

Yan

Performance-driven optimal design of distributed monitoring for large-scale nonlinear processes. Chemometr Intell Lab 2016; 155: 151–159.

31.

Zhang

Song

Nonlinear process monitoring based on linear subspace and Bayesian inference. J Process Contr 2010; 20(5): 676–688.

32.

Zhu

Song

Large-scale plant-wide process modeling and hierarchical monitoring: a distributed Bayesian network approach. J Process Contr 2018; 65: 91–106.

33.

Jiang

Huang

Ding

SX.

Bayesian fault diagnosis with asynchronous measurements and its application in networked distributed monitoring. IEEE T Ind Electron 2016; 63(10): 6316–6324.

34.

MacGregor

Kourti

Statistical process control of multivariate processes. Control Eng Pract 1995; 3(3): 403–414.

35.

Song

Process monitoring based on independent component analysis-principal component analysis (ICA-PCA) and similarity factors. Ind Eng Chem Res 2007; 46(7): 2054–2063.

36.

Huang

Yan

Angle-based multiblock independent component analysis method with a new block dissimilarity statistic for non-Gaussian process monitoring. Ind Eng Chem Res 2016; 55(17): 4997–5005.

37.

Bjorck

Golub

Numerical methods for computing angles between linear subspaces. Math Comput 1971; 27(123): 579–594.

38.

Zhu

Knyazev

Angles between subspaces and their tangents. J Numer Math 2013; 21(4): 325–340.

39.

Bishop

CM.

Pattern recognition and machine learning. London: Springer, 2006.

40.

Downs

Vogel

EF.

A plant-wide industrial process control problem. Comput Chem Eng 1993; 17(3): 245–255.

41.

Zhu

Song

Distributed parallel PCA for modeling and monitoring of large-scale plant-wide processes with big data. IEEE T Ind Inform 2017; 13(4): 1877–1885.