Study on Immune Relevant Vector Machine Based Intelligent Fault Detection and Diagnosis Algorithm

Abstract

An immune relevant vector machine (IRVM) based intelligent classification method is proposed by combining the random real-valued negative selection (RRNS) algorithm and the relevant vector machine (RVM) algorithm. The method proposed is aimed to handle the training problem of missing or incomplete fault sampling data and is inspired by the “self/nonself” recognition principle in the artificial immune systems. The detectors, generated by the RRNS, are treated as the “nonself” training samples and used to train the RVM model together with the “self” training samples. After the training succeeds, the “nonself” detection model, which requires only the “self” training samples, is obtained for the fault detection and diagnosis. It provides a general way solving the problems of this type and can be applied for both fault detection and fault diagnosis. The standard Fisher's Iris flower dataset is used to experimentally testify the proposed method, and the results are compared with those from the support vector data description (SVDD) method. Experimental results have shown the validity and practicability of the proposed method.

1. Introduction

The system failure of a complex mechatronic machine is usually caused by the failure of its critical components. The nature of the fault diagnosis is to gain the correct feature parameters of the critical parts by monitoring the relevant internal and external signals. Therefore, intelligent fault detection and diagnosis are widely treated as a pattern recognition problems. In the problem, the normal state and the fault state are categorized as different kinds of patterns. A rational model is set up between the state features and different patterns, which is also addressed as the fault detection model and can intelligently recognize the working status of the devices.

The literature on the fault detection methods is very rich. Most of its parts are on the supervized learning methods, such as neural networks [1], support vector machines [2 –4], relevance vector machine [5 –8], naive Bayes [9 –11], and decision tree [12]. Supervized learning methods require a sufficient number of fault data for training; however, fault data are usually insufficient in practice, particularly at the early stages of operation, where probably only normal data samples are available. Under this circumstance, the traditional intelligent diagnosis methods have their deficiencies: the fault diagnosis model can hardly be trained with only normal samples; moreover, the fault detection model trained with normal data together with only part of the failure data cannot identify unknown faults and always assigns them wrongly to some known categories.

Inspired by the “self/nonself” recognition principle in artificial immune systems, an immune relevant vector machine algorithm is proposed to solve the problems with traditional methods. This method combines the random real-valued negative selection (RRNS) algorithm and the relevant vector machine (RVM) algorithm, detecting the “nonself” mode with only “self” training samples. Based on the proposed algorithm and its utilization with other methods, accurate fault detection and unknown failure detection can be performed with only normal samples or incomplete training samples.

The remainder of this paper is organized as follows. The first section introduces the immune relevant nonself vector machine detection algorithm. The second section describes the fault detection model based on the immune relevant vector machine. In the third section, standard Fisher's Iris flower data are applied to experimentally verify the proposed algorithm and the model. The results are compared with other anomaly detection algorithms. The fourth section gives the conclusions.

2. Immune Relevant Vector Machine “Nonself” Fault Detection Algorithm

2.1. Definitions

Definition 1 (system state space). System state can be denoted by a feature vector x, which is normalized and expressed as follows:

x = (x_{1}, x_{2}, \dots, x_{n}) ∊ {[0,1]}^{n} .

(1)

System state space is denoted as G ⊆ [0, 1]ⁿ, including all the possible system states.

Definition 2 (self space). The self space is a space of feature vectors with known target status, denoted as S, where S ⊂ G.

Definition 3 (nonself space). The nonself space is the complement of the self space in the system state space, denoted by N_S, where S + N_S = G.

Definition 4 (nonself detection). The nonself detection here includes both the anomaly detection and the multiclass fault diagnosis in the sense of the traditional negative selection algorithm. For the nonself detection problem, a state recognition function is estimated given a self sample set S′ ⊆ S, and then the function is applied to identify whether an observed system state belongs to a self space or a nonself space.

Note that, for brevity, self data/samples denote data/samples in the self space, which refer to normal data or data of a known class, and nonself data/sample denote data/sample in the nonself space, which are abnormal data or data that do not belong to a given class.

2.2. Improved Normalization Method

Each dimension of the feature vector x must be normalized within the range of [0, 1]. When part of the self space is known and the whole system state space is unknown, the literature on how to properly normalize the feature vector is relatively rare. References [13, 14] have proposed a normalization method, where the self sample is normalized into the space [0, 1]ⁿ using the following equation (2):

s (i) = \frac{S (i) - B \cdot \min [S (i)]}{\max [S (i)] - \min [S (i)]}, i = 1,2, \dots, n,

(2)

where S(i) is the ith column of the original self sample matrix S, s(i) is the ith column of the normalized self sample matrix, and max[S(i)] and min[S(i)] are the maximum and the minimum values of S(i), respectively. B′ = [1, 1, …, 1]_{1 × n}. n is the dimension of the feature vector space.

The disadvantage of the previous method is as follows.

Space [0, 1]ⁿ is actually a self space after normalization using (2). Its boundary is either the maximum or the minimum value of the self samples, which cannot represent the whole system space. Therefore, the detectors generated in this [0, 1]ⁿ space cannot represent the nonself space.

The original samples must be normalized following the same procedure to guarantee their comparability, so as the unknown samples. However, the dimension of some feature vector of the unknown sample may exceed that of max[S(i)] or min[S(i)]. If all samples are normalized in the same way, the dimension will overflow and the detection will not succeed.

For real-world applications, the range of the feature vector in the nonself space is unknown, and only part of the self space samples can be used to estimate the self space. It is very important to normalize the whole system state space before any other manipulations. An improved normalization method is proposed based on the RRNS and the RVM methods.

Let max[G(i)] and min[G(i)] be the maximum and the minimum values of the ith feature vector of the system state space G, respectively, where i = 1, 2, …, n.

Assumption 5. Assume that the self space is in the middle of the whole system state space. Range of feature vectors in the self space is 1/3 of the whole space. The nonself space occupies the rest larger 1/3 and the smaller 1/3 of the range [0, 1]. The following equations exist accordingly:

\begin{matrix} \min [G (i)] = \min [S (i)] - (\max [S (i)] - \min [S (i)]) \\ \max [G (i)] = \max [S (i)] + (\max [S (i)] - \min [S (i)]) \end{matrix}

(3)

\max [S (i)] - \min [S (i)] = \frac{1}{3} (\max [G (i)] - \min [G (i)]) .

(4)

Assumption 6. Assume that X = (X₁, X₂, …, X_n) is one original sample with the ith feature vector as X_i, where i = 1, 2, …, n. If X_i lies out of the range of G(i), then its value is truncated as follows:

\begin{matrix} if X_{i} > \max [G (i)] then X_{i} = \max [G (i)] \\ if X_{i} < \min [G (i)] then X_{i} = \min [G (i)] . \end{matrix}

(5)

The following normalization is proposed based on Assumptions 5 and 6:

x_{i} = \frac{X_{i} - \min [G (i)]}{\max [G (i)] - \min [G (i)]}, i = 1,2, \dots, n .

(6)

By substituting (3) into (6), the normalization equation related only to the self space samples is obtained, which is

x_{i} = \frac{X_{i} - 2 \cdot \min [S (i)] + \max [S (i)]}{3 \cdot (\max [S (i)] - \min [S (i)])}, i = 1,2, \dots, n .

(7)

After normalizing all the n dimensions, the feature vector, x = (x₁, x₂, …, x_n), belongs to [0, 1]ⁿ.

Remarks. This algorithm combines RRNS and RVM, where the detectors generated by RRNS are considered as the set of nonself samples, together with the set of self space samples, serving as the two classes of samples for training the RVM classification model. The main purpose here is to construct a hyperplane for the classification between the self space and the nonself space.

The proposed normalization method maps the original self space into the center of the system state space surrounded by the nonself space. Therefore, it is possible to train and find the hyperplane separating the self space and the nonself space.

The normalized samples occupy the center 1/3 range. The self space and the nonself space are balanced in either the positive direction or the negative direction, which enables the training of the classifying hyper plane, that is, the smallest irregular hypersphere containing the self space samples.

After the training succeeds, the truncated samples and the results after truncation are considered as nonself samples. Therefore, the truncation does not affect the classification. The classification model can be applied in the whole Rⁿ space without dimension overflow.

2.3. Immune Relevant Vector Machine (IRVM) Nonself Detection Algorithm

According to the mechanism of biological immune recognition, the traditional immune recognition algorithm generates a large number of detectors using different negative selection algorithms. The detectors are then compared with unknown state samples to detect anomaly. Although the method is simple and intuitive, its efficiency is usually low because the number of detectors is too large in practice, which inhibits the application of the method.

In the studies here, the detectors are generated as the nonself supervized learning samples because they are evenly distributed in the nonself space, which are then trained using RVM. Those are the most important ideas of the immune relevant vector machine nonself detection algorithm.

Figure 1 is the flowchart of the IRVM nonself detection algorithm, including the training phase and the detection phase. During the training phase, the training samples are the self space samples instead of the nonself space samples. Three steps are required during the training phase.

Figure 1:

Flowchart of the relevance vector machine based nonself-detection algorithm.

Step 1. The system state space normalization: The self sample set is normalized according to (7), denoted as S′ after normalization.

Step 2. Nonself sample generation: A fixed number of detectors are generated in the system state space using RRNS based on the self sample set S′. The detectors serve as the training sample set in the nonself space, denoted as N_S′.

Step 3. RVM classification model training: Merge the self sample set S′ and the nonself samples set N_S′, input them to the RVM classification training algorithm, and train the RVM classification model using the RBF kernel functions, whose radius is selected as 0.5, which is the center of the normalized range.

The IRVM training algorithm can be considered as a combination of the normalization, RRNS, and the RVM classification algorithm, as shown in Figure 1. IRVM nonself detection algorithm requires only one class of training samples and can obtain the RVM classification model for two classes.

During the detection phase, RVM nonself detection model contains two parts, the normalization process and the RVM classification model training, whose output is the posterior probability. If the posterior probability is greater than 0.5, then the sample is a self sample, otherwise, a nonself sample.

The combination of RRNS algorithm and RVM algorithm in the IRVM nonself detection algorithm exploits the RRNS ability of simulating nonself samples in the nonself space and adopts the RVM advantages of being simple and fast at the same time.

3. Fault Detection and Diagnosis Based on Immune Relevant Vector Machine

3.1. Fault Detection Model Based on IRVM

Fault detection is a typical classification problem for two, since only the normal state and the fault state exist. The fault diagnosis model based on IRVM classification needs only the normal samples. The model treats the feature vector space under normal conditions as the self space and the feature vector under fault conditions as the nonself space.

The IRVM fault detection system is shown in Figure 2, including the model training phase and the online detection phase. Compared with Figure 1, the whole system is included in this figure.

Figure 2:

Immune relevance vector machine based fault detection system.

Two steps are included in the model training phase.

Step 1. A number of normal state samples are collected, the characteristic features are extracted, the feature vectors are constructed, and the set of normal samples is formulated. According to the real working conditions, different types of data may be collected to validate of the final detection model under different working conditions.

Step 2. The IRVM fault detection model is gained after the training of the normal state samples according to the IRVM training algorithm.

During the online fault detection phase, the state data are collected, the feature vectors are extracted, and the unknown state samples to be tested are then formed and input to the IRVM fault detection model. Whether the state is a fault state or a normal state is determined according to the posterior probability output.

3.2. Fault Diagnosis Model Based on IRVM

Traditional immune detection algorithm determines self and nonself modes and then detects anomaly. In many cases, to detect fault is usually not enough. The class of the faults and the level of faults need detection too, which forms a multi-class classification problem. Explanation on how to diagnose multiple fault classes using the immune relevant vector machine nonself detection algorithm is illustrated in the following paragraphs.

The framework of IRVM fault diagnosis model for multiple classes is plotted in Figure 3. The model is composed of multiple IRVM single class fault diagnosis models. Each IRVM single class fault diagnosis model is trained in the same way as that shown in Figure 2, while its training sample set is the fault sample set of one specific fault class instead of a normal sample set. During the training phase, one IRVM detection model for one class of failure is trained according to the IRVM training algorithm.

Figure 3:

Principle of the construction of IRVM based multiclass fault diagnosis model.

For detecting multiple fault classes, the procedure is as follows: first of all, the unknown sample to be detected is input to every IRVM detection model. Then a posterior probability output is obtained connecting with one class of failure. Afterwards, the largest posterior probability is selected and the referred class becomes a candidate failure class. If the posterior probability is greater than 0.5, then the referred class of failure is the model output. Otherwise, if no class has the posterior probability greater than 0.5, then this testing sample does not belong to any known fault class and is detected as an unknown class.

The IRVM multi-class fault diagnosis model includes all target fault IRVM detection models together with a combinational detection rules:

IRVM fault diagnosis model = {all target fault IRVM detection models} + {combinational detection rules} .

(8)

The structure of the IRVM multi-class detection model is similar to that of the standard RVM one against All (RVM-OAA) algorithm, but the OAA method does not actually occupy the whole nonself space since it uses the samples other than the target samples as the nonself space sample, which may lead to misclassification of unknown classes and wrong diagnosis. Therefore, IRVM has better generalization ability than the RVM-OAA method.

4. Experimental Validation and Analysis

In order to verify the effectiveness of the IRVM nonself detection algorithm, the standard Fisher's Iris dataset is used for experimental verification and analysis. The experiment results are also compared with those of the support vector data description (SVDD) [15] method, another commonly used anomaly detection method in the area of fault detection.

4.1. Experiments on Model Verification with the Fisher's Iris Data

Fisher's Iris is a commonly recognized dataset for algorithm verification in the area of machine learning. The Iris dataset has four features of three types of irises. The three types of irises are Setosa, Virginica, and Versicolor. Each type of irises has 50 samples with a total of 150 sample data. The four features are the petal length, petal width, sepal length, and sepal width.

During the experiments, Setosa is considered as the self type, while Virginica and Versicolor are considered as the nonself types. Forty Setosa samples are randomly chosen as the training samples of the self space and the rest of the samples are all the testing samples. The experiment is repeated for 5 times to reduce the random error and carried on with 2 features (feature 1 and feature 2) and 4 features separately for visualizing the algorithm effectiveness. The 2-dimensional data distribution is shown in Figure 4. Figure 4(a) is the original data distribution and Figure 4(b) is the data distribution after the normalization.

Figure 4:

Data distribution of the Iris dataset. (a) Original value. (b) Normalized value.

4.2. Results and Analysis

Tables 1 and 2 list the results for the 2-dimensional and the 4-dimensional Iris data, respectively. Tr. Err, Te. FP, and TE.FN are the training error rate, the testing false positive rate, and training false negative rate, respectively.

Table 1:

Results for the 2-dimensional Iris data.

Experiment results	SVDD				IRVM
Experiment results	Tr. Err	Te. FP	Te. FN	Te. Err	Tr. Err	Te. FP	Te. FN	Te. Err

1	0.100	0.1	0.02	0.0273	0	0.0	0.03	0.0273
2	0.100	0.2	0.00	0.0182	0	0.1	0.00	0.0091
3	0.075	0.1	0.01	0.0182	0	0.0	0.02	0.0182
4	0.075	0.2	0.02	0.0364	0	0.1	0.00	0.0091
5	0.075	0.2	0.02	0.0364	0	0.0	0.00	0.0000

Average	0.085	0.16	0.014	0.0273	0	0.04	0.01	0.0127

Table 2:

Results for the 4-dimensional Iris data.

Experiment results	SVDD				IRVM
Experiment results	Tr. Err	Te. FP	Te. FN	Te. Err	Tr. Err	Te. FP	Te. FN	Te. Err

1	0.125	0.1	0	0.0091	0	0.0	0	0.0000
2	0.100	0.2	0	0.0182	0	0.1	0	0.0091
3	0.100	0.3	0	0.0273	0	0.0	0	0.0000
4	0.075	0.6	0	0.0545	0	0.2	0	0.0182
5	0.125	0.1	0	0.0091	0	0.0	0	0.0000

Average	0.105	0.26	0	0.0236	0	0.06	0	0.0055

From Tables 1 and 2, the following conclusions are drawn:

IRVM does not have a training error while SVDD does.

Both IRVM and SVDD have a high accuracy for nonself testing samples; that is, both have low false negative rates. When all the 4 features are considered, the false negative rates are zeros, which indicates that the self set and the nonself set are highly separable.

IRVM has a better false positive rate than SVDD.

Generally, IRVM is more accurate and has better generalization ability when the self set and nonself set are highly separable.

The RRNS-RVM training and testing results are depicted in Figures 5 and 6, respectively. Figure 7 is the comparative results using SVDD. Figure 5 shows the distribution of the self training samples, the distribution of nonself training samples using RRNS, as well as the dividing line of RVM model, the relevant vector points, and the posterior probability distribution. The following is seen:

RRNS can generate more evenly distributed nonself detectors, that is, nonself samples.

The dividing line from the classification model trained using IRVM wraps the self training sample set as a closed curve, separating the self space and the nonself space.

The improved normalization method balances the self space and the nonself space in every direction, which guarantees the training accuracy of the RVM model.

Figure 5:

Training results of the IRVM experiment of Iris dataset.

Figure 6:

Testing results of the IRVM experiment of Iris dataset.

Figure 7:

Training and testing results of the SVDD experiment of Iris dataset.

Comparing Figures 5, 6, and 7, by observing the dividing lines of IRVM and SVDD, it is seen that IRVM has zero training error, while SVDD always has some training error. The reason is that IRVM is to find a closure through training, within which is the whole self sample space, while SVDD originates from anomaly detection, that is, to remove all the abnormal samples and minimize the false positive rate [16, 17]. Similarly, IRVM model has a much lower false negative rate than the SVDD model.

Iris dataset is a relatively simple and small dataset. The experiments here are basically for illustrative purposes. Interested readers may use several other datasets [18] to testify the performance of the proposed method.

5. Conclusions

In this paper, an immune relevant vector machine intelligent fault detection and diagnosis method is proposed, inspired by “self/nonself” recognition mechanism in the artificial immune systems. The method is able to handle the efficiency problem due to missing or incomplete fault samples in the traditional intelligent classification algorithm, since it combines the random real-valued negative selection (RRNS) algorithm in the artificial immune system and the relevant vector machine (RVM) algorithm. The detectors generated by RRNS are considered as the nonself training samples, which are used to train the RVM model together with the self samples. The detection model is then obtained requiring only the self samples. Based on the model and the method, the fault detection can be performed using only the normal samples, and the known and unknown faults can be identified accurately with incomplete training samples. The method proposed adopts the advantage of the RNSS algorithms and outranks the traditional immune algorithms with reduced computation load and improved efficiency.

Footnotes

Acknowledgments

This paper is supported by the National Natural Science Foundation of China under Grant 61175038, Shanghai Education Commission Project (nos. 12YZ010, 12JC1404100, 11CH-05).

References

Chen

, Vibration Monitoring and Fault Detection of Mechanical Equipments, Shanghai Jiao-Tong University Press, 1999, (Chinese).

Samanta

, “Gear fault detection using artificial neural networks and support vector machines with genetic algorithms,” Mechanical Systems and Signal Processing, vol. 18, no. 3, pp. 625–644, 2004.

J. D.

and Chan

J. J.

, “Faulted gear identification of a rotating machinery based on wavelet transform and artificial neural network,” Expert Systems with Applications, vol. 36, no. 5, pp. 8862–8875, 2009.

Yang

B. S.

Hwang

W. W.

Kim

D. J.

, and Tan

A. G.

, “Condition classification of small reciprocating compressor for refrigerators using artificial neural networks and support vector machines,” Mechanical Systems and Signal Processing, vol. 19, no. 2, pp. 371–390, 2005.

Yang

B. S.

Han

, and Hwang

W. W.

, “Fault diagnosis of rotating machinery based on multi-class support vector machines,” Journal of Mechanical Science and Technology, vol. 19, no. 3, pp. 846–859, 2005.

Widodo

and Yang

B. S.

, “Support vector machine in machine condition monitoring and fault diagnosis,” Mechanical Systems and Signal Processing, vol. 21, no. 6, pp. 2560–2574, 2007.

Ortiz

and Syrmos

, “Support vector machines and wavelet packet analysis for fault detection and identification,” in Proceedings of IEEE International Conference on Neural Networks (IJCNN '06), pp. 3449–3456, Vancouver, Canada, July 2006.

Zhang

, and Xu

, “Fault diagnosis using support vector machine with an application in sheet metal stamping operations,” Mechanical Systems and Signal Processing, vol. 18, no. 1, pp. 143–159, 2004.

Lei

L. Y.

and Zhang

, “Relevance vector machine based bearing fault diagnosis,” in Proceedings of the International Conference on Machine Learning and Cybernetics, vol. 1–7, pp. 3492–3496, August 2006.

10.

Huang

, “Relevance vector machine based gear fault detection,” in Proceedings of the Chinese Conference on Pattern Recognition (CCPR '09), pp. 1–5, November 2009.

11.

Widodo

Yang

B.-S.

Kim

E. Y.

Tan

A. C. C.

, and Mathew

, “Fault diagnosis of low speed bearing based on acoustic emission signal and multi-class relevance vector machine,” Nondestructive Testing and Evaluation, vol. 24, no. 4, pp. 313–328, 2009.

12.

Zhu

Y. L.

Wang

, and Geng

L. Q.

, “Transformer fault diagnosis based on naive bayesian classifier and SVR,” in Proceedings of IEEE Region 10 Conference (TENCON '06), pp. 1–4, Hong Kong, China, November 2006.

13.

Gonzalez

Dasgupta

, and Kozma

, “Combining negative selection and classification techniques for anomaly detection,” in Proceedings of the Congress on Evolutionary Computation (CEC '02), pp. 705–710, Hawaii, Hawaii, USA, May 2002.

14.

Gonzalez

Dasgupta

, and Fernando Nino

, “A randomized realvalued negative selection algorithm,” in Proceedings of the 2nd International Conference on Artificial Immune System, pp. 261–272, Edinburgh, UK, September 2003.

15.

Tipping

M. E.

and Faul

A. C.

, “Fast marginal likelihood maximisation for sparse Bayesian models,” in Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, 2003.

16.

Zhang

J. F.

Yan

Q. H.

Zhang

Y. L.

, and Huang

Z. C.

, “Novel fault class detection based on novelty detection methods,” in Intelligent Computing in Signal Processing and Pattern Recognition, vol. 345 of Lecture Notes in Control and Information Sciences, pp. 982–987, 2006.

17.

Tax

D. M. J.

Ypma

, and Duin

R. P. W.

, “Pump failure detection using support vector data descriptions,” in Advances in Intelligent Data Analysis, vol. 1642 of Lecture Notes in Computer Science, pp. 415–425, 1999.

18.

http://www.csie.ntu.edu.tw/∼cjlin/libsvmtools/datasets/.