Fault feature extraction method based on local mean decomposition Shannon entropy and improved kernel principal component analysis model

Abstract

To effectively extract the typical features of the bearing, a new method that related the local mean decomposition Shannon entropy and improved kernel principal component analysis model was proposed. First, the features are extracted by time–frequency domain method, local mean decomposition, and using the Shannon entropy to process the original separated product functions, so as to get the original features. However, the features been extracted still contain superfluous information; the nonlinear multi-features process technique, kernel principal component analysis, is introduced to fuse the characters. The kernel principal component analysis is improved by the weight factor. The extracted characteristic features were inputted in the Morlet wavelet kernel support vector machine to get the bearing running state classification model, bearing running state was thereby identified. Cases of test and actual were analyzed.

Keywords

Local mean decomposition weight kernel principal component analysis Morlet wavelet kernel support vector machine bearing fault features

Introduction

Bearing is commonly used in rotary machinery; the bearing failing will cause the machine to break down. So it is very important to recognize the fault features of the bearing, and then distinguish the running condition of the bearing.

In order to realize the bearing running state identification, there are two key issues. First, the typical features need to be extracted from the original signals, which need to take advantage of a variety of information and identify the bearing state precisely. Second, based on the features, we need a useful model to realize the running state recognition of bearing. Several methods such as time domain analysis, time–frequency domain analysis, and frequency domain analysis are used in feature extraction.¹ And the time–frequency domain analysis methods such as the wavelet and empirical mode decomposition (EMD) are widely reported. However, the wavelet is difficult to select the mother coefficient. The EMD² has the problems of the modal aliasing and endpoint leak. In this article, the local mean decomposition (LMD) is introduced to process the signals.^3,4 In order to extract the features, the entropy is usually used. Traditional entropy is derived from the concept of probability and measures the discrimination of criteria.⁵ De Luca⁶ used the nonprobabilistic entropy and introduced some requirements to capture intuitive comprehension of the degree of fuzziness. Scmidt and Kacprzyk⁷ researched the nonprobabilistic entropy. Following are the definitions of Shanon entropy, fuzzy entropy, and intuitionistic fuzzy entropy, where the Shannon entropy is approved as more powerful.⁸ In this research, the Shannon entropy is used to extract the typical features.

After the feature is extracted by the LMD Shannon entropy, the features have the character of high dimension and the features cannot be processed by the artificial intelligence model precisely, because the information hidden in the feature are not mapped adequately and the artificial intelligence model is mainly for processing the low-dimensional features. So the feature dimension reduction method is needed to process the dimension of the features. There are many methods for processing the features, such as the principal component analysis (PCA),⁹ the kernel principal component analysis (KPCA),¹⁰ and the manifold-learning algorithm methods such as local tangent space alignment (LTSA).¹¹ But the PCA is mainly used for dealing with the linear data set, while the bearing vibration features are usually suppressed by the nonlinear characteristic features, so the PCA cannot work effectively. The KPCA and the LTSA can be used to extract the nonlinear characteristic features, and the LTSA has the problem of unsteadily and difficult to choose the neighborhood. In this research, the KPCA is selected and in addition the weight factor is added to the model to further improve the dimension reduction and feature selection effort.

Based on the typical feature extraction by the LMD Shannon entropy and the weighted KPCA method, the support vector machine (SVM) is served as a classifier.¹² But the SVM model is not useful to process the nonlinear feature, some researcher proposed the combination method of SVM and wavelet theories to achieve the classification, and get better performance than other leaning machine models; in this research, the new SVM model is constructed based on the Morlet kernel.¹³

This article is organized as follows: In section “Feature extraction,” the LMD energy entropy concept is introduced and the LMD energy entropies are calculated. And the improved nonlinear embedding method weight KPCA is described. In section “The Morlet wavelet kernel SVM model,” the Morlet wavelet kernel SVM model is constructed. In section “Validation,” the proposed method is verified by bearing test data. Conclusion of the research is detailed in section “Conclusion.”

The proposed method flowchart is shown in Figure 1.

Figure 1.

The proposed paper flowchart.

Feature extraction

In the LMD definition, any signal $x (t)$ in the process can be introduced as follows:^3,4 The product function (PF) component $P F_{p} (t)$ is separated from the signal $x (t)$ , the $u_{k} (t)$ works as a monotonic function, and $x (t)$ will be decomposed into the sum of k PF components and $u_{k} (t)$

x (t) = \sum_{p = 1}^{k} P F_{p} (t) + u_{k} (t)

(1)

From equation (1), it is clear that, following LMD, features from the vibration signal are better retained.

Once the k PF components and $u_{k} (t)$ are obtained, where the energy of the k PFs $E_{1}, E_{2}, \dots, E_{k}$ can be calculated, respectively, and then, due to the orthogonality of the LMD decomposition, the sum of the energy of the k PFs should be equal to the total energy of the original signal when the residue $u_{k} (t)$ is ignored. As the PFs include different frequency components, $E = {E_{1}, E_{2}, \dots, E_{k}}$ forms an energy distribution in the frequency domain of roller bearing vibration signal, and then the corresponding LMD energy entropy is designated as follows

H_{entropy} = - \sum_{i = 1}^{k} p_{i} \log p_{i}

(2)

where $p_{i} = E_{i} / E$ is the percent of the energy in the whole signal energy $(E = \sum_{i = 1}^{k} E_{i})$ .

After the LMD energy entropy is calculated, from one signal we can get one entropy value, and the bearing vibration signals are collected at different conditions and status, the resulting entropy will be in high dimensional, which will affect the classification effect of the SVM model, and thus the KPCA is used to reduce the feature dimension.

In KPCA, a set of multi-dimensional signals, $x_{k}, k = 1, \dots, K$ , is envisaged to be mapped through a nonlinear function $ϕ (x_{k})$ into a feature space yielding the mapped data set $Φ = [ϕ (x_{1}), ϕ (x_{2}), \dots, ϕ (x_{k})]$ . The scatter matrix for zero mean data is given by $C = Φ Φ^{T}$ . Then, a kernel matrix can be constructed as $K = Φ^{T} Φ$ . Using the kernel trick, the centered kernel matrix can be expressed as follows^14,15

\begin{array}{l} K_{c} = (I - \frac{1}{K} j_{K} j_{K}^{T}) Φ^{T} Φ (I - \frac{1}{K} j_{K} j_{K}^{T}) \\ = (I - \frac{1}{K} j_{K} j_{K}^{T}) K (I - \frac{1}{K} j_{k} j_{K}^{T}) \end{array}

(3)

where $j_{K} = [1, 1, \dots, 1]^{T} = [1, 1, \dots, 1] T$ is a vector with dimension $K \times 1$ , and $I$ is a $K \times K$ identity matrix. Notice that each element $k (i, j) \equiv k (x_{i}, x_{j})$ of the kernel matrix depends on the inner product $ϕ^{T} (x_{i}) ϕ (x_{j})$ , it can be computed using only the data $x_{k}$ in input space. For instance, if a radial basis function (RBF) kernel is used, $k (i, j)$ is calculated according to

k (i, j) \equiv k (x_{i}, x_{j}) = \exp (- \frac{| | x_{i} - x_{j} | |^{2}}{2 σ^{2}})

(4)

where $σ^{2}$ is a free parameter related to the width of the kernel. It can be chosen according to any suitable data spread criterion.

The eigendecomposition of $K_{c}$ provides the necessary information to compute the projection of a vector of the input space $y_{j}$ in the feature space, because the eigenvalues of the scatter matrix $C$ coincide with the eigenvalues of the kernel matrix $K$ . Considering the columns of which represent the L eigenvectors of the kernel matrix, the matrix $V$ , and D, a diagonal matrix with the corresponding $L \leq K$ eigenvalues of both matrices, the image $ϕ (y_{j})$ of a point in input space, by the eigenvectors of the scatter matrix it can be projected onto the L directions spanned via

z_{j} = D^{- 1 / 2} V^{T} (I - \frac{1}{K} j_{K} j_{K}^{T}) Φ^{T} ϕ (y_{j})

(5)

where $Φ^{T} ϕ (y_{j})$ represents the vector, the components of which can be computed using the kernel trick by

k_{y_{j}} = [k (x_{1}, y_{j}), k (x_{2}, y_{j}), \dots, k (x_{K}, y_{j})]^{T}

(6)

In this research, the Relief¹⁶ method is used to determine the weight factor. In all the features set, $x_{i} = {x_{i 1}, x_{i 2}, \dots, x_{in}}$ of n eigenvalues belong to the ith sample should be determined. For any arbitrary sample $x_{i}$ , first, search for k nearest neighbors of sample instance $y_{i} = {j = 1, 2, \dots, k}$ which belong to the same category with the $x_{i}$ . Then, search for k nearest neighbors of sample instance $h_{j} (C) = {j = 1, 2, \dots, k, C \neq class (x_{i})}$ which do not belong to the same category with the $x_{i}$ .

Setting the P represents the difference of the feature $x_{i}$ and $h_{i} (C)$

P = \sum_{j = 1}^{k} \frac{| x_{i} - y_{j} |}{max (X) - min (X)}

(7)

Setting the Q represents the difference of the feature $x_{i}$ and $y_{i}$

Q = \sum_{C \neq class (x_{i})} \frac{G (l)}{1 - G (class (x_{i}))} \times \sum_{j = 1}^{k} \frac{| x_{i} - y_{j} |}{max (X) - min (X)}

(8)

where $G (l)$ represents the appearing probability of the class l.

The weight factor can be got from

W (x_{i}) = W (x_{i}) - \frac{P - Q}{k}

(9)

The Morlet wavelet kernel SVM model

Based on Mercer’s Theorem,^17,18 there are few number of wavelet kernel functions that meet the condition. In this research, the Morlet wavelet kernel as a wavelet kernel that meets the condition is given. The Morlet wavelet function can be defined as follows

ψ (x) = \cos (w_{o} x) e^{- x^{2} / 2}

(10)

The Morlet wavelet kernel function is defined as follows

\begin{array}{l} k (x, x^{'}) = k (x - x^{'}) = \prod_{i = 1}^{d} ψ (\frac{x_{i} - x_{i}^{'}}{a_{i}}) \\ = \prod_{i = 1}^{d} \cos [w_{o} (\frac{x_{i} - x_{i}^{'}}{a_{i}}) e^{[- (({(x_{i} - x_{i}^{'})}^{2}) / (2 a_{i}^{2})]}] \end{array}

(11)

In this research, the Morlet wavelet kernel function is worked as the support vector’s kernel function, and the SVM is defined as follows

\begin{array}{l} f (x) = sgn \\ {\sum_{i = 1}^{n} a_{i} y_{i} \prod_{j = 1}^{d} \cos [w_{o} (\frac{x_{j} - x_{j}^{'}}{a_{j}}) \times e^{[- ({(x_{j} - x_{j}^{'})}^{2}) / (2 a_{j}^{2})]} + b]} \end{array}

(12)

From equations (11) and (12), the Morlet wavelet kernel SVM is constructed, and the new SVM will work as the model to separate the different bearing fault features.

Validation

Case 1

The Case Western Reserve University bearing fault signals is used to validate the proposed method.¹⁹ The 2-hp reliance electric motor is used in the experiments. The SKF 6205-2RS JEM type bearing in the experiments is used. The test is done in order to simulate the bearing normal running state and fault running states, with fault depth of 0.18, 0.36, 0.53, and 0.71 mm at the outer raceway, inner raceway, and the ball to reflect the bearing deteriorating state. Electro-discharge machining is used to achieve the bearings faults seeded. In this case, the inner raceway fault signals were chosen. The vibration collection rate is set as 12,000/s, and 4096 data points were chosen for analyzing. 50 groups of bearing vibration data of each running states were chosen, with 30 groups for testing, the other 20 groups for training. Figure 2 shows the collected vibration signals at different states.

Figure 2.

The collected vibration signals of normal state and inner-race four different fault depths. (a) normal; (b) the 0.18-mm depth of inner-race; (c) the 0.36-mm depth of inner-race; (d) the 0.53-mm depth of inner-race; and (e) the 0.71-mm depth of inner-race.

Based on the collected vibration signals, the LMD model is used to decompose the vibration signal, so as to get the PFs of all signals. The original signal is shown in Figure 3, and the 0.36-mm inner raceway fault PFs decomposed by LMD is shown in Figure 4.

Figure 3.

The original signal of 0.36-mm inner-race fault.

Figure 4.

The PFs decomposed by the LMD of the 0.36-mm inner-race signal.

The decomposed signals are 6PFs and 1; the result based on the LMD method is calculated based on the residue which is less than the set amplitude value. Based on the decomposed PFs, the Shannon entropy is used to calculate the value of every PFs, and the calculated Shannon entropy values are worked as the original features of the bearing running state indicator. However, those features still with high dimension and the features are so complex for the recognition model to achieve the recognition. The weighted KPCA is used to extract the typical features: normalized calculated entropy feature energy input to the KPCA, and reduction of mutual correlation between the features and characteristics of the fault further purification (Table 1).

Table 1.

The contribution rate of the Shannon entropy of the decomposed PFs.

Eigenvalues	4.17	0.99	0.62	0.19	0.02	0.005
Contribution rate (%)	69.54	16.61	10.37	3.19	0.28	0.04

PFs: product functions.

From a statistical viewpoint, the main contribution component corresponding to the training sample rate is greater than 75%. And through calculation, the first three eigenvalues account for 96.1% contribution rate of the total contribution rate; therefore, the first three eigenvalues corresponding component is chosen as a main input features. In order to compare the effort of the weighted KPCA on feature extraction, the PCA method, the KPCA method, and the weighted KPCA method are used to extract the features. The result are shown in Figures 5 –7.

Figure 5.

The feature extraction result based on PCA model.

Figure 6.

The feature extraction result based on KPCA model.

Figure 7.

The feature extraction result based on weighted KPCA model.

Thus, we have normalized characteristic energy input to the KPCA, reduction of mutual correlation between the features and characteristics of the fault further purification, and the main contribution component corresponding to the training sample rate.

From the results we can see that, the PCA method cannot deal with the features given in this case, because the features in this case are nonlinear character, the PCA is excellent in dealing with the linear data set, so the result of the PCA is poor. The KPCA can deal with the features better than the PCA model of nonlinear features; however, the results showing that for some features, the KPCA cannot separate them, while the weight KPCA can deal them well, so this proved the effect of the weight KPCA.

Based on the extracted feature, the Morlet wavelet kernel SVM is used identify the bearing running state. The five types of features are input into the new SVM ( $w_{0}$ is 5, a is 0.3) to train and recognize the running state. The proposed method and other methods are compared in this research. (1) The LMD Shannon entropy is extracted and the results directly input into the Morlet wavelet kernel SVM model. (2) The LMD Shannon entropy is extracted and the weight KPCA is used to process the features, and the RBF kernel SVM (the nuclear parameter $γ$ is set to 0.5, and the penalty factor C is set to 10) is used. (3)The proposed model used in this research. The results are shown in Table 2.

Table 2.

The recognition rate of different methods.

Methods	Recognition rate, η (%)
Methods	Normal state	0.18 mm Fault depth	0.36 mm Fault depth	0.53 mm Fault depth	0.71 mm Fault depth
Without weight KPCA	60	76	74	73	90
RBF kernel SVM	88	89	93	90	98
The proposed methods	100	100	100	100	100

RBF: radial basis function; KPCA: kernel principal component analysis; SVM: support vector machine.

From Table 2 we can see that, if the feature extraction based on the Shannon entropy directly been entered into the Morlet wavelet kernel SVM model, then the results are not satisfied. This is because the features are in high dimension and the main character is burned, so the recognition model cannot recognize the different types of features. The RBF SVM model also has some problem in recognizing the different characters, this is because for the local minimum, the RBF SVM is not excellent in dealing with the problem, while the Morlet wavelet kernel is good at finding the minor feature difference and it can solve the problem.

Case 2

In order to further validate the proposed method, the actual application is tested. The used rig is shown in Figure 8.

Figure 8.

The test rig.

The rolling bearings driven by 0.55 KW AC motor are hosted on the shaft, the speed control, and AC inverter controller. The rotation speed is set as 1000 r/min. The brake, radial booster, using the magnetic clutch and brake, maximum torque is 5 N m, the radial load added to the bearing is set as 29.4 N. For every 2 h, the vibration data are collected. The data sampling rate is 25,600 Hz and 102,400 points are collected every time, as shown in Figure 9. The bearing is run for 12 months. For every 2 months, a set of data is chosen; 60 groups of collected data of different faults are obtained, with 30 groups for testing and the other 30 groups for training. In all, 4096 data points are selected to analyze. The decomposed method of LMD is used to process the collected signal, and the result is shown in Figure 10.

Figure 9.

The collected vibration signal.

Figure 10.

The decomposed vibration signal.

Then, the proposed method is used to achieve the bearing running state recognition, the original features are extracted by the LMD, and the decomposed PFs are shown in Figure 10.

Then, the features of the PFs are calculated by the Shannon entropy method, based on the calculated Shannon entropy values, the weight KPCA is used to deal with the features. A group of features of different fault conditions are obtained, as shown in Table 3 (not normalized beforehand).

Table 3.

A group of LMD energy entropy of different running states of the actual signal.

Bearing running states	H₁	H₂	H₃	H₄	H₅
Normal state	1.2657	1.2316	1.1091	1.1012	1.1316
Working for 2 months	1.0684	1. 0201	1.2452	1.2566	1.3910
Working for 4 months	0.9506	0.9896	0.9597	1.1166	0.9259
Working for 6 months	0.8923	0.8369	0.8949	0.8283	0.8675
Working for 8 months	0.6031	0.5108	0.6392	0.6967	0.7042
Working for 10 months	0.7663	0.7095	0.7953	0.7325	0.8034
Working for 12 months	0.7509	0.7993	0.8331	0.7408	0.8296

LMD: local mean decomposition.

Based on the features dealt by the weight KPCA, the Morlet wavelet kernel SVM is used to achieve the running state recognition. The results are shown in Table 4.

Table 4.

The recognition results based on the proposed method (recognition rate, η (%)).

Working states	Recognition rate, η (%)
Normal state	100
Working for 2 months	91
Working for 4 months	92
Working for 6 months	96
Working for 8 months	94
Working for 10 months	95
Working for 12 months	99

From Table 4, we can see that the recognition rate is good; for some special machine, for example, the bearing work in the space, the method can achieve the running state recognition based on the collected vibration signals, so this validate the effect of the methods.

Conclusion

In this research, the LMD Shannon entropy method is used to extract the original features of signals. In order to reduce the feature dimension of the entropy, the weight KPCA was used. The Morlet kernel SVM was constructed. And the recognition results are validated by the collected signals in the research. However, this research has not been used in the industry, which will be done in the further research.

Footnotes

Acknowledgements

The authors are grateful to the anonymous reviewers for their helpful comments and constructive suggestions.

Academic Editor: Amir Alavi

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the National Natural Science Foundation of China (Nos 51405047, 51305472), The Scientific Research Fund of Chongqing Municipal Education Commission (No. KJ1500529), Chongqing Postdoctoral Science Foundation funded project (No. xm2015001), Science application research project of COSCO, China (Grant No. 2010-1-H-001), The Key Laboratory of Road Construction Technology and Equipment (Chang’an University), (No. 2014SZS11-K02), China Postdoctoral Science Foundation funded this research, Project no. 2014M552316, and the basic and advanced research key project of Chongqing (No. cstc2015jcyjBX0140).

References

Dong

Sun

Tang

BP.

A fault diagnosis method for rotating machinery based on PCA and Morlet Kernel SVM. Math Probl Eng 2014; 842: 805–808.

LF.

Degradation process prediction for rotational machinery based on hybrid intelligent model. Robotics CIM: Int Manuf 2012; 28: 190–207.

Cheng

Yang

A rotating machinery fault diagnosis method based on local mean decomposition. Digit Signal Process 2012; 22: 356–366.

Sun

Xiao

Wen

JT.

Natural gas pipeline small leakage feature extraction and recognition based on LMD envelope spectrum entropy and SVM. Measurement 2014; 55: 434–443.

Deepa

Sanjay

Intuitionistic fuzzy entropy and distance measure based TOPSIS method for multi-criteria decision making. Egypt Inform J 2014; 15: 97–104.

De Luca

Tremini

. Definition of a non probabilistic entropy in the setting of fuzzy set theory. Inform Control 1972; 20: 301–312.

Szmidt

Kacprzyk

Distances between intuitionistic fuzzy sets. Fuzzy Set Syst 2000; 114: 505–518.

De Pinho

Marcelo

Jose

RCP

. Shannon’s entropy applied to the analysis of tonotopic reorganization in a computational model of classical conditioning. Neurocomputing 2002; 44–46: 359–364.

Dong

Luo

TH.

Bearing degradation process prediction based on the PCA and optimized LS-SVM model. Measurement 2013; 46: 3143–3152.

10.

Yang

YP.

Ensemble kernel principal component analysis for improved nonlinear process monitoring. Ind Eng Chem Res 2015; 54: 318.

11.

Wang

Jiang

Gou

Extended local tangent space alignment for classification. Neurocomputing 2012; 77: 261–266.

12.

Minimal Euclidean distance chart based on support vector regression for monitoring mean shifts of auto-correlated processes. Int J Prod Econ 2013; 141: 377–387.

13.

Gryllias

Antoniadis

IA.

A support vector machine approach based on physical model training for rolling element bearing fault detection in industrial environments. Eng Appl Artif Intel 2012; 25: 326–344.

14.

Kuang

WH.

A novel hybrid KPCA and SVM with GA model for intrusion detection. Appl Soft Comput 2014; 18: 178–184.

15.

Zhang

Zuo

HF.

Classification of fault location and performance degradation of a roller bearing. Measurement 2013; 46: 1178–1189.

16.

Kononenko

. Estimating attributes analysis and extensions of RELIEF. In: Proceedings of the 7th European conference on machine learning, Catania, 6–8 April 1994, pp.171–182. Berlin: Springer.

17.

Cristianini

Taylor

JS.

An introduction to support vector machines and other kernel-based learning methods. New York: Cambridge University Press, 2000.

18.

Product demand forecasts using wavelet kernel support vector machine and particle swarm optimization in manufacture system. J Comput Appl Math 2010; 233: 2481–2491.

19.

Case Western Reserve University Bearing Data Center, 2009, http://www.case.edu/source/