Fault diagnosis method based on wavelet packet-energy entropy and fuzzy kernel extreme learning machine

Abstract

Aiming at connatural limitations of extreme learning machine in practice, a new fault diagnosis method based on wavelet packet-energy entropy and fuzzy kernel extreme learning machine is proposed. On one hand, the presented method can extract the more efficient features using the wavelet packet-energy entropy method, and on the other hand, the sample fuzzy membership degree matrix U, weight matrix W which is used to describe the sample imbalance, and the kernel function are introduced to construct the fuzzy kernel extreme learning machine model with high accuracy and reliability. The experimental results of rolling bearing and check valve are obtained and analyzed in MATLAB 2010b. The results show that the proposed fuzzy kernel extreme learning machine method can obtain fairly or slightly better classification performance than the traditional extreme learning machine, kernel extreme learning machine, back propagation, support vector machine, and fuzzy support vector machine.

Keywords

Rolling bearing check valve wavelet packet-energy entropy fuzzy kernel extreme learning machine fault diagnosis

Introduction

The extreme learning machine (ELM) is proposed to obtain faster training speed and better generalization performance in the research process of single-hidden layer feed-forward neural network (SLFN).¹ ELM, as a kind of rapid development and effective machine learning technique, has became the focus and hot topic in many fields since it has been proposed.^2,3 This work mainly focuses on the ELM application in fault classification field. Liu et al.⁴ have proposed the extreme support vector machine (E-SVM) by substituting ELM kernel for SVM kernel to obtain better results. But the widespread application of ELM in classification field should attribute to the significant research that the classification problem based on ELM can be turned into the optimization problem in theory.⁵ Since then, ELM has been rapidly developed and applied in the field of fault diagnosis.^6–8 Hu et al.⁹ proposed a multi-stage ELM to diagnose the faults of the hydraulic tube and improve the accuracy of clustering. In this method, the input data sets were divided into several stages. And then, every stage was analyzed using ELM independently. Compared to individual ELM, this method can obtain better performance. Xu et al.¹⁰ have presented an extension sample classification-based extreme learning machine ensemble (ESC-ELME) method to achieve higher accuracy and faster response in the complex process of fault diagnosis. Yet, essentially, ESC-ELME is an integration of multi-ELMs. Sharma et al.¹¹ proposed multi-classes ELM method to monitor the external faults of excited induction motors and compared its performance with multilayer perceptron neural network. The results revealed that ELM algorithm is quite faster and it can reduce the computational load.

However, in actual production process, the inherent limitations of ELM hinder its wide application. These limitations include but are not limited to (1) how to determine the number of neurons, the transfer function of the hidden layer, and the parameters of ELM? and (2) how to choose the sensitive features by combining with the suitable data processing algorithm to construct the ELM model? To address these questions, the improvement of ELM is one of the most important and difficult points in the research and application of ELM.^1–2 Razavi-Far et al.¹² have proposed an adaptive incremental ensemble of ELM for fault diagnosis of multiple bearing defects. The proposed method contains two major units of data processing and decision-making. The method received time-domain feature subsets chunk by chunk, incrementally trained a set of ELMs to classify bearing defects, allowed the ELM to cross check, and adjusted their decision to diagnose new classes of defects. Yang et al.¹³ developed a new ELM optimized by quantum-behaved particle swarm optimization (QPSO) to diagnose faults of the gas turbine fan engine. The QPSO has been used to select optimal network parameters including the number of hidden layer neurons and the norm of output weights. The results shown that the proposed method was a more reliable and suitable method than conventional neural network and other ELM methods for the defect diagnosis of the gas turbine engine. Luo et al.¹⁴ proposed a hybrid system, named as hybrid gravitational search algorithm-ELM, to diagnose the fault of rolling element bearings. And the real-valued gravitational search algorithm (RGSA) has been employed to optimize the input weights and bias of ELM, and the binary-valued gravitational search algorithm (BGSA) has been used to select important features from a compound feature set in this method. Lu et al.¹⁵ developed a novel method, named as dual reduced kernel extreme learning machine (DR-KELM), to improve the sparsity of KELM, which incorporated traditional greedy forward learning algorithm into backward learning algorithm to gain more sparsity and reduce testing time further. The application of DR-KELM in aero-engine fault diagnosis also demonstrated its superior performance with more sparse structure. Wu et al.¹⁶ proposed a new fault diagnosis method based on improved extreme learning machine (IELM) to solve the weakness (weak generalization ability, low diagnostic rate) of traditional fault diagnosis with feed-forward neural network algorithm. By introducing principal component analysis, the proportion of each input parameter has been determined according to the contribution rate of each of input vector, and it has been regarded as weight parameter of ELM. Chen et al.¹⁷ have presented an optimized KELM to diagnose the fault of photovoltaic arrays. In this method, the Nelder–Mead simplex method has been employed to optimize the parameters of KELM which directly affect the classification performance. The experimental results also shown that the proposed method has improved the reliability, efficiency, and safety of photovoltaic power station. Rauber et al.¹⁸ have systematically compared the classification performance of the ELM with alternative classification models, such as SVM, decision tree, and random forest. And the developed method has been applied to diagnose the fault of complex motor pump system and achieve better result. Mao et al.¹⁹ have proposed an online sequential prediction method for imbalanced fault diagnosis problem based on ELM. This strategy can effectively reduce the information loss when dealing with data imbalance problem. Yao et al.²⁰ have combined the multi-scale intrinsic mode function permutation entropy with ELM to propose new intelligent approach for the fault diagnosis of bearing. The method derived more fault information and extracted multidimensional features from the time–frequency domain to reflect the bearing conditions from different perspectives. Wong et al.²¹ have shown a new application of ELM in the real-time fault diagnostic system for rotating machinery, and the system has been successfully developed to monitor the component conditions of the gas turbine generator. Different from this viewpoint of network structure, Zhao et al.²² put forward to improve the limitation of many hidden nodes in traditional ELM from the point of view of data structure. Moreover, according to variable conditions, an intelligent fault diagnosis method based on local mean decomposition–singular value decomposition (SVD) and ELM has been proposed by Tian et al.²³

With the fast development of the ELM, it has obtained certain research and application achievements. However, ELM has still broad space for development in the aspects of theory and application, such as data imbalance, data outlier, data missing, massive data, and nonlinearity. Therefore, a new fault diagnosis method based on wavelet packet-energy entropy (WPEE) and F-KELM is proposed. At the first step, a compound feature sets consisting of energy distribution coefficient and energy entropy are extracted using WPEE. And then, a novel F-KELM is proposed by combining fuzzy membership degree matrix U and weight matrix W. At the same time, the particle swarm optimization (PSO) algorithm is applied to optimize the parameters of F-KELM. Finally, the comparison tests between the proposed method and the state-of-the-art methods (ELM, KELM, SVM, F-SVM, etc.) are carried out. According to the analysis of the experimental results, the important contributions and the conclusions are summarized as follows:

The sensitive and transient features are extracted using the WPEE method in massive data to improve the recognition accuracy and reliability of the subsequent classification models.

The kernel function is introduced in the data processing to deal with these issues, including sample nonlinear mapping, the number of neurons and the output matrix H of the hidden layer. According to the input and output sample properties, the proper kernel function is selected to calculate the output matrix H.

The sample membership degree matrix U and the weight matrix W which is used to describe the sample imbalance are introduced to construct the F-KELM classification model with high accuracy and reliability. It can not only obtain the similar performance with the original ELM, KELM, back propagation (BP), SVM, and F-SVM but also solve the above-mentioned problems in original ELM.

The remainder of the article is organized as follows. Section “Background theory” outlines the principles of wavelet packet decomposition, ELM, KELM, and F-KELM. The specific implementation process of the proposed method is given in section “Fault diagnosis method based on WPEE and F-KELM.” The experimental results are analyzed in section “Performance evaluation.” Section “Conclusion” ends this article with conclusions and future work.

Background theory

Wavelet packet decomposition

Wavelet packet analysis can decompose and reconstruct a signal with little loss of information. Compared to wavelet analysis, wavelet packet analysis can not only decompose the low-frequency part but also achieve secondary decomposition of the high-frequency part. So, its signal analysis ability is stronger.²⁴

The subspace $U_{j}^{n}$ is defined by the closed space of the $U_{n} (t)$ function. Assume $g_{j}^{n} (t) \in U_{j}^{n}$ , then $g_{j}^{n} (t)$ is given as

g_{j}^{n} (t) = \sum_{l} d_{i}^{j, n} u_{n} (2^{j} - l)

(1)

Using wavelet packet decomposition algorithm, we can calculate $d_{j}^{j, 2 n}$ and $d_{j}^{j, 2 n + 1}$ through $d_{j}^{j + 1, n}$

d_{l}^{j, 2 n} = \sum_{k} a_{k - 2 l} d_{k}^{j + 1, n}

(2)

d_{l}^{j, 2 n + 1} = \sum_{k} b_{k - 2 l} d_{k}^{j + 1, n}

(3)

where $a_{k - 2 l}$ and $b_{k - 2 l}$ are the conjugated filter coefficients of wavelet packet decomposition, respectively.

Using wavelet packet reconstruction algorithm, we can calculate $d_{l}^{j + 1, n}$ through $d_{l}^{j, 2 n}$ and $d_{l}^{j, 2 n + 1}$

d_{l}^{j + 1, n} = \sum_{k} [h_{l - 2 k} d_{k}^{j, 2 n} + g_{l - 2 k} d_{k}^{j, 2 n + 1}]

(4)

where $h_{k - 2 l}$ and $g_{k - 2 l}$ are the conjugated filter coefficients of wavelet packet reconstruction, respectively.

ELM

The typical structure of the ELM is shown in Figure 1.

Figure 1.

Typical structure of the ELM.

The principle of ELM is briefly described according to Figure 1. A detailed description of ELM refers the study of Huang and colleagues.^1,25 Given N training samples $(x_{i}, x_{j})$ , the mathematical model of the ELM is

H β = Y

(5)

where H = [h₁(x), …, h_L(x)] is the output matrix of the hidden layer, in which L is the number of neurons in the hidden layer; β is the output weight; and Y is the target vector. The calculation of H is the core of ELM model. There are many efficient methods to solve the above problems, such as SVD and generalized inverse.^2,26,27

At the same time, we can also see that the essence of the ELM is similar to the BP neural network with single hidden layer. The major distinction between the BP neural network and the ELM is that ELM can obtain the faster training speed by randomly given bias b of the hidden layer and the connection weight matrix w.

KELM

Actually, the solution of original problem can be turned into a classification optimization problem in ELM. From the perspective of optimization, the realization principle of ELM is also similar to SVM and least squares support vector machine (LSSVM). The ultimate goal is to obtain the minimum training error and the largest classification margin, named as generalization capability. According to the optimization structure of SVM,²⁷ the optimized mathematical model of ELM can be formulated

\begin{matrix} \min L_{KELM} = \frac{1}{2} {‖ β ‖}^{2} + \frac{C}{2} {‖ ξ_{i} ‖}^{2} \\ s . t . h (x_{i}) β = Y_{i}^{T} - ξ_{i}^{T}, i = 1, \dots, M \end{matrix}}

(6)

where $ξ_{i}^{T} = [ξ_{i, 1}, \dots, ξ_{i, M}]^{T}$ is the training error vector and C is the regular parameter (also called as penalty parameter) which achieves the balance between the minimum training error and the largest classification margin.

Based on the Karush–Kuhn–Tucker (KKT) theorem, the optimal solution of equation (6) is solved and the detailed solution procedure has been described in the study of Huang and colleagues.^1,2,5 Combining equation (5) with equation (6), the optimal solution of equation (6) is given as

{\begin{matrix} β = {\hat{H}}^{+} Y = H^{T} {(\frac{I}{C} + H H^{T})}^{- 1} Y N < L \\ β = {\hat{H}}^{+} Y = {(\frac{I}{C} + H H^{T})}^{- 1} H^{T} Y L < N \end{matrix}

(7)

where ${\hat{H}}^{+}$ denotes the Moore–Penrose generalized inverse of matrix H.

Given the new testing samples, the output of ELM is Y′ = sign h(x)β. According to equation (7), the output Y′ can be expressed as follows

Y' = {\begin{matrix} sign h (x) H^{T} {(\frac{I}{C} + H H^{T})}^{- 1} Y N < L \\ sign h (x) {(\frac{I}{C} + H H^{T})}^{- 1} H^{T} Y L < N \end{matrix}

(8)

According to the kernel function mapping theory,²⁸ ELM model based on kernel function, named as KELM, is shown in equation (9)

\begin{matrix} Y' = sign h (x) H^{T} {(\frac{I}{C} + H H^{T})}^{- 1} Y \\ = sign [\begin{matrix} K (x, x_{1}) \\ ⋮ \\ K (x, x_{N}) \end{matrix}] {(\frac{I}{C} + Ω_{ELM})}^{- 1} Y \end{matrix}

(9)

where $Ω_{KELM} = H H^{T} = h (x_{i}) \cdot h (x_{j}) = K (x_{i}, x_{j})$ , $γ$ is the kernel function parameter, and $K (u, v) = \exp (- γ {‖ u - v ‖}^{2})$ .

Discussion of the F-KELM

The kernel function is introduced to turn the choice problem of hidden layer neurons into kernel-function feature mapping in KELM. KELM can overcome the adverse effects of the choice problem of hidden layer neurons, but it cannot solve the problem of data imbalance, data outlier, and data missing. Therefore, a new classification model, called as F-KELM by combining fuzzy membership degree with weight factor and ELM, is proposed. The advantages of the proposed method are shown as follows:

The weight matrix W is used to solve the problem of data imbalance.

The membership degree matrix U can effectively solve the uncertainty problem of dataset category.

The theoretical solution process of the F-KELM is also illustrated in the case of binary classification. Given dataset ([X_i,Y_i], i = 1, …, M) is a binary classification. In other words, “Y_i = + 1” denotes positive sample. Conversely, “Y_i = −1” is a negative sample. The estimation of the weight matrix W satisfies the following principles: W is given smaller value when X_i comes from the high-proportion data sample and vice versa. At the same time, the sample fuzzy membership degree matrix U is also used to describe the uncertainty problem of the sample category. equation (6) can be converted into the following expression

\begin{matrix} min L_{F - KELM} = \frac{1}{2} {‖ β ‖}^{2} + \frac{CW}{2} μ_{i} {‖ ξ_{i} ‖}^{2} \\ s . t . h (x_{i}) β = Y_{i}^{T} - ξ_{i}^{T}, i = 1, \dots, M \end{matrix}}

(10)

Based on KKT theorem, equation (10) is equivalent to solving the following dual optimization problem

L_{F - K E L M} = \frac{1}{2} {‖ β ‖}^{2} + \frac{C W}{2} μ_{i} {‖ ξ_{i} ‖}^{2} - \sum_{i = 1}^{N} α_{i} (h (x_{i}) β - Y_{i} + ξ_{i})

(11)

where a_i is the Lagrangian multiplier. The optimality conditions based on the KKT are described as follows

{\begin{matrix} \frac{\partial L_{F - KELM}}{\partial β_{i}} = 0 \Rightarrow β = \sum_{i = 1}^{n} α_{i} h {(x_{i})}^{T} = H^{T} α \\ \frac{\partial L_{F - KELM}}{\partial ξ_{i}} = 0 \Rightarrow α_{i} = CW μ_{i} ξ_{i} \\ \frac{\partial L_{F - KELM}}{\partial a_{i}} = 0 \Rightarrow h (x_{i}) β - Y_{i} + ξ_{i} = 0, i = 1, \dots, N \end{matrix}

(12)

and equation (12) can be equivalently written as

(\frac{U}{C} + WH H^{T}) α = WY

(13)

where fuzzy membership degree matrix $U = diag (1 / μ_{1}, \dots, 1 / μ_{N})$ is a diagonal matrix. The solving process of U is described in the study of Wang et al.²⁹ Similarly, the output weight β and target vector Y′ in F-KELM model can be expressed as follows

{\begin{matrix} β = {\hat{H}}^{+} Y = H^{T} {(\frac{U}{C} + WH H^{T})}^{- 1} WY N < L \\ β = {\hat{H}}^{+} Y = H {(\frac{U}{C} + WH H^{T})}^{- 1} WY L < N \end{matrix}

(14)

\begin{matrix} Y' = sign h (x) H^{T} {(\frac{U}{C} + WH H^{T})}^{- 1} Y \\ = sign {[\begin{matrix} K (x, x_{1}) \\ ⋮ \\ K (x, x_{N}) \end{matrix}]}^{T} {(\frac{U}{C} + W Ω_{ELM})}^{- 1} WY \end{matrix}

(15)

For m-category cases, the predicted category label of a test set is the index number of the output node, which has the maximum output value

label (x) = \underset{j \in {1, \dots, m}}{\arg max} Y'_{j} (x)

(16)

Fault diagnosis method based on WPEE and F-KELM

The implementation process of the proposed method is shown in Figure 2, including vibration signal pre-processing, feature extraction, and pattern recognition. The normalization of signal is realized in the pre-processing process. The feature extraction and pattern recognition are the important key parts of the proposed method and will be discussed in detail.

Figure 2.

Implementation process of the proposed method.

Feature extraction

When a fault occurs, the stiffness of mechanical structures around the fault must change, which can produce an impulse or shock. This impulse may further result in the variation of collected vibration signal. The amplitudes and distributions of these time-domain signal may be changed. Therefore, time-domain statistical features can be extracted from their time-domain waveforms to reflect the mechanical faults. Moreover, frequency spectrum, as a most widely used analysis tool, can reflect the frequency components and distribution of a signal. So, it is vital for fault diagnosis to extract some indicators in the frequency domain. With the advent of time–frequency analysis techniques such as typical wavelet transform and empirical mode decomposition, some effective features can be extracted from time–frequency distribution of a signal and overcome some limitations of features, which are extracted from time domain and frequency domain. In time–frequency domain, the feature extraction makes full use of the advantage of wavelet packet and energy entropy in this article. The energy distribution coefficients and energy entropy of the third-layer wavelet packet decomposition coefficients are extracted as characteristic parameters of the subsequent classification models.³⁰ Based on the above considerations, the features shown in Table 1 are extracted to construct fault diagnosis models and the validity of the proposed features is compared with time-domain and frequency-domain features.

Table 1.

Statistical features under different domains.

Domain	Number	Features
Time domain	3	Kurtosis: X_kurtosis = $\sum_{n = 1}^{N} {(x (n) - X (m))}^{4} / (N - 1) X_{sd}^{4}$ Impulse factor: X_impulse = $X_{rms} / \frac{1}{N} \sum_{n = 1}^{N} \| x (n) \|$ Clearance factor: X_clearance = $X_{peak} / X_{root}$
Frequency domain	2	Mean frequency: $p_{1} = X_{mf} = \sum_{k = 1}^{K} s (k) / K$ Spectrum energy: $p_{4} = \sum_{k = 1}^{K} {(s (k) - p_{1})}^{4} / \frac{K}{K - 1} {({\sum_{k = 1}^{K} (s (k) - p_{1})}^{2})}^{2}$
Time–frequency domain (proposed WPEE features)	2	Wavelet packet-energy distribution coefficient P and energy entropy H

In Table 1, x(n) is a signal series for n = 1,2, …, N and N is the number of data points. $X_{m} = \frac{1}{N} \sum_{n = 1}^{N} x (n)$ , $X_{sd} = \sqrt{\frac{1}{N - 1} \sum_{n = 1}^{N} {(x (n) - X_{m})}^{2}}$ , $X_{rms} = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} {(x (n))}^{2}}$ , $X_{peak} = max | x (n) |$ , and $X_{root} = {(\frac{1}{N} \sum_{n = 1}^{N} | x (n) |)}^{2}$ represent the mean value, standard deviation, root mean square, peak, and root amplitude, respectively. And feature p₁ may indicate the vibration energy in the frequency domain and p₄ may describe the convergence of the spectrum power, reflecting the energy of frequency spectrum; s(k) is the spectrum for k = 1,2, …, K and K is the number of spectrum lines.

In order to better describe the WPEE features, the implementation processes of feature extraction are described as follows:

Step 1—Signal decomposition and reconstruction. The vibration signal of rolling bearing or check valve is decomposed into three layers with wavelet packet decomposition method, respectively. In this article, “db10” wavelet is the basic wavelet packet base function.

Step 2—Constructing feature vectors. The feature vector T = [P₃₀, P₃₁, …, P₃₇, H] is composed of wavelet packet-energy distribution coefficient P_3i and energy entropy H. And then, P_3i and H can be calculated by equations (17) and (18)

P_{3 i} = \frac{\sqrt{E_{3 i}}}{\sum_{i = 1}^{L} \sqrt{E_{3 i}}}

(17)

H = - \sum_{i = 1}^{L} p_{3 i}^{2} \log p_{3 i}^{2}

(18)

where E_3i is the energy of the third-layer decomposition nodes, P_3i represents the wavelet packet-energy distribution coefficient which is calculated by the reconstructed signal of the third-layer decomposition coefficients, and L denotes the number of the third decomposition node signals.

Construct classification model

The ELM, KELM, F-KELM, BP, SVM, and F-SVM classification models are established to carry out comparative experiments and illustrate the effectiveness of the presented method. In the aforementioned six classification models, the selection of classification model parameters, which is the key of these classification models, will be elaborated as follows.

Selection of hidden layer neurons

In constructing process of ELM model, the choice of hidden layer neurons can refer the BP neural network. So, the number of hidden layer neurons L has approximation relation with the number of input layer neurons N, namely, $L = 2 \times N + 1$ .

Fuzzy membership degree matrix U

The clustering method based on density function is employed to calculate the fuzzy membership matrix U. The detailed solving process of matrix U has been demonstrated in the study of Wang et al.²⁹

Imbalance weight matrix W and kernel function parameters $γ$

In F-KELM, PSO method is introduced to optimize parameters W and $γ$ . Figure 3 shows the flow chart of the optimization process of PSO. At the same time, the parameters of SVM and F-SVM have also been optimized using PSO.

Figure 3.

Flowchart of the particle swarm optimization.

In the canonical PSO, each particle i has position z_i and velocity v_i that is updated at each iteration according to equation (19)

{\vec{v}}_{i} = w {\vec{v}}_{i} + c_{1} {\vec{φ}}_{1 i} ({\vec{p}}_{i} - {\vec{z}}_{i}) + c_{2} {\vec{φ}}_{2 i} ({\vec{p}}_{g} - {\vec{z}}_{i})

(19)

where w is the inertia weight described in the study of Rajeswari et al.,³¹ ${\vec{p}}_{i}$ is the best position found so far by particle ${\vec{p}}_{i}$ , and ${\vec{p}}_{g}$ is the global best so far found by the swarm. ${\vec{φ}}_{1}$ and ${\vec{φ}}_{2}$ weights are randomly generated at each step for each particle component. c₁ and c₂ are positive constant parameters called acceleration coefficients (which control the maximum step size the particle can achieve). The position of each particle is updated at each iteration by adding the velocity vector to the position vector

{\vec{z}}_{i} = {\vec{z}}_{i} + {\vec{v}}_{i}

(20)

The inertia weight w (which is a user-defined parameter), together with c₁ and c₂, controls the contribution of past velocity values to the current velocity of the particle. A large inertia weight biases the search toward global exploration, while a smaller inertia weight directs toward fine-tuning the current solutions (exploitation). Suitable selection of the inertia weight and acceleration coefficients can provide a balance between the global and the local search.³¹ The PSO algorithm is composed of five main steps:

Step 1. Initialize the position vector z and associated velocity v of all particles in the population randomly. Then, set a maximum velocity and a maximum particle movement amplitude in order to decrease the cost of evaluation and to get a good convergence rate.

Step 2. Evaluate the fitness of each particle via the fitness function. There are many options when choosing a fitness function, and trial and error is often required to find a good one.

Step 3. Compare the particle’s fitness evaluation with the particle’s best solution. If the current value is better than previous best solution, replace it and set the current solution as the local best. Compare the individual particle’s fitness with the population’s global best. If the fitness of the current solution is better than the global best’s fitness, set the current solution as the new global best.

Step 4. Change velocities and positions using equations (19) and (20).

Step 5. Repeat steps 2–4 until a stopping criterion is satisfied or a predefined number of iterations is completed.

Performance evaluation

Fault diagnosis of rolling bearing

Dataset description

The rolling bearing datasets³² of Case Western Reserve University in the United States are used to verify the validity and reliability of the proposed method. The acceleration sensor is installed on the drive end of the motor without load. The data sampling frequency f_s is 12 KHz. The experimental datasets are composed of the vibration signal of the normal condition and three different kinds of fault/defect bearing. The defect diameter is 0.007 and 0.021 in, respectively, and the fault depth is 0.011 in.

The length N of data sample for each of operating state is 20,480, and then it is divided into a 2048 × 10 matrix. Then, the data of seven different kinds of fault property constitute a 2048 × 10 × 7 sample matrix. In the training process of the model, the 2048 × 6 × 7 samples are randomly selected as the training datasets, and the remaining 2048 × 4 × 7 samples are the test datasets. The feature vectors T_train of WPEE is a 9 × 42 training sample matrix, and the feature vectors T_test is a 9 × 28 test sample matrix. Moreover, the detailed sample attributes are shown in Table 2.

Table 2.

Attribute table for sample class.

Sample class description	Fault diameter	Sample size	Category label
Normal bearing	–	2048 × 10	1
Outer race fault	0.007 in	2048 × 10	2
Outer race fault	0.021 in	2048 × 10	3
Inner race fault	0.007 in	2048 × 10	4
Inner race fault	0.021 in	2048 × 10	5
Ball fault	0.007 in	2048 × 10	6
Ball fault	0.021 in	2048 × 10	7

Feature extraction

According to the definition of feature extraction in section “Feature extraction,” the WPEE features of rolling bearing can be calculated. The partial features (not all features) extracted by WPEE are shown in Table 3.

Table 3.

Wavelet packet-energy entropy features.

Fault class	Fault diameter	Feature parameters
Fault class	Fault diameter	P ₀	P ₁	P ₂	P ₃	P ₄	P ₅	P ₆	P ₇	H
Normal bearing	–	0.337003	0.600165	0.005351	0.054996	2.49E-05	0.000619	0.000832	0.001009	0.632728
Outer race fault	0.007 in	0.008388	0.00271	0.27745	0.004005	0.002619	0.004481	0.671691	0.028656	0.563551
Outer race fault	0.021 in	0.066441	0.020624	0.232835	0.012713	0.00047	0.002047	0.627203	0.037666	0.563047
Inner race fault	0.007 in	0.046486	0.114714	0.279147	0.065426	0.000626	0.005492	0.425097	0.063011	0.623897
Inner race fault	0.021 in	0.051862	0.010159	0.545139	0.014339	0.000249	0.000972	0.374442	0.002838	0.654778
Ball fault	0.007 in	0.068632	0.030687	0.282121	0.014447	0.000361	0.001833	0.594484	0.007435	0.603181
Ball fault	0.021 in	0.09948	0.043702	0.279276	0.017571	0.001268	0.010389	0.512089	0.036225	0.619825

Bold values represent the sensitive/effective features for different operating states.

Comparing the normal bearing with the fault bearing, the operating conditions will be easily distinguished based on the feature parameter P, but it cannot identify the bearing faults in different degrees. However, the bearing faults in different degrees can be distinguished efficiently by combining P with H in this article. It has fully demonstrated that the feature extraction method based on WPEE is effective and reliable. Similarly, in the experiment of check valve fault recognition under sample balanced distribution, taking the above features as the training and test samples of F-KELM, the classification results of different features are shown in Figure 4.

Figure 4.

Classification results of different features: (a) time-domain features, (b) frequency-domain features, and (c) WPEE features.

According to the results shown in Figure 4, the presented WPEE features in this article can better describe the state characteristics of check valve and achieve better classification results. Figure 4 also confirms that the effectiveness and feasibility of the extracted features are verified.

Parameter setting of classification model

According to the data description in Table 2, the parameters of the above-mentioned classification models are set based on the following criteria: (1) the number of neurons L of hidden layer in ELM is equal to 19; (2) the initialization parameters of PSO, including acceleration factors c₁ and c₂, are searched in the range {0, 2}. The empirical values c₁ and c₂ are assigned as 2. The sample size is searched in the range {20, 40}. r₁ and r₂ are random numbers in the range {0,1}. The maximum iterations and initial velocity of population are assigned as 300 and 3, respectively. The inertia weight is equal to 0.9, and the update weight of population is 0.6.

Experimental results of balanced sample distribution

In order to verify the effectiveness of the proposed method, the performance of the presented F-KELM method is compared with ELM, KELM, BP, SVM, and F-SVM, respectively. Figure 5 displays the classification results of six different classification models in seven different kinds of running condition for rolling bearing. In the case of balanced sample distribution, the following conclusions can be drawn from Figure 5: (1) the classification accuracy gradually increases and even above 100% in F-KELM and F-SVM algorithm; (2) when the parameters of BP model are consistent with the parameters of ELM model, the classification difference of the two algorithms is not very evident; (3) the classification results of ELM and SVM show that the experimental result of ELM is consistent with the result of SVM in this experiment. This may be related to the fact that ELM is essentially the fusion of BP and SVM. At the same time, when the kernel function is introduced to solve the selection problem of the hidden layer neurons in ELM, the classification accuracy of ELM algorithm can be further improved; (4) in both ELM and SVM algorithms, the introduction of fuzzy mechanism is helpful to improve the classification accuracy.

Figure 5.

Experimental results of the balanced sample distribution (rolling bearing): (a) ELM, (b) KELM, (c) F-KELM, (d) BP, (e) SVM, and (f) F-SVM.

Figure 6 represents the PSO fitness curve. The corresponding optimal parameter of classification models can be obtained from Figure 6. C and γ are optimized as 8.1859 and 0.01, respectively. When the number of iterations is 20, F-KELM has obtained the best fitness. But the number of iterations in KELM is 40. The comparison results show that the presented F-KELM method can greatly save optimization time of the model and improve the practical application performance. In addition, the different results of the number of hidden layer neurons on classification performance of the ELM are shown in Figure 7.

Figure 6.

PSO fitness curve: (a) optimized process of KELM and (b) optimized process of F-KELM.

Figure 7.

Comparison and analysis of classification effect for different hidden layer neurons.

Experimental results of imbalanced sample distribution

Similarly, the parameters C and γ optimized by PSO are 85.8483 and 0.01, respectively, in imbalanced sample distribution. Then, the recognition results of six different classification models are shown in Figure 8.

Figure 8.

Experimental results of the imbalanced sample distribution (rolling bearing): (a) ELM, (b) KELM, (c) F-KELM, (d) BP, (e) SVM, and (f) F-SVM.

Because of the imbalanced distribution of data samples, some experimental results which are different from the balanced sample distribution are obtained and shown in Figure 8. Then, some beneficial phenomena can be observed and summarized as follows: (1) the misclassification probability of ELM and BP increases greatly in the case of imbalanced distribution. Then, compared to the balanced sample distribution, the recognition accuracy of ELM and BP slightly decreases with increasing misclassification; (2) by introducing the kernel learning mechanism, KELM can overcome the limitations of selecting the number of hidden layer neurons in ELM, and achieve fairly or slightly better performance than SVM and ELM. But it still cannot remove the effect of imbalanced sample distribution to obtain the ideal classification performance; (3) an interesting fact can also be observed in Figure 8(e). Although the sample is imbalance, SVM can also achieve a slightly better classification performance than ELM. This may be related to the small sample advantage of SVM. Therefore, in order to improve the classification accuracy and reliability of ELM under imbalanced sample distribution, ELM must be improved. The experimental results also show that it is reasonable and feasible to introduce fuzzy mechanism into ELM; (4) in Figure 8(f), we can also see that the introduction of fuzzy mechanism in F-SVM is also helpful to improve the classification accuracy and reliability; (5) in the F-KELM model, the fuzzy membership degree matrix and the sample weight matrix are applied to restrain the influence of imbalanced sample distribution on classification results and improve the recognition accuracy and reliability.

Table 4 is visually designed to demonstrate the classification recognition results of six different classification models for rolling bearing in two different sample distributions.

Table 4.

Experimental results of different sample distributions (rolling bearing).

Sample distribution	Accuracy of classification recognition (%)
Sample distribution	ELM	KELM	F-KELM	BP	SVM	F-SVM
Balanced sample distribution	92.8571	96.4286	100	89.2857	92.8571	100
Unbalanced sample distribution	85.1852	92.2963	100	81.4815	92.5926	100

The statistic results of Table 4 show that the proposed F-KELM method can obtain fairly or slightly better performance than the above-mentioned other methods. By comparing the experimental results of the proposed method with the F-SVM, it can also be seen that the introduction of fuzzy membership is helpful to improve the classification performance. At the same time, the necessity of considering the influence of imbalanced sample distribution on the classification performance of the model is also proved.

Fault diagnosis of check valve

Dataset description

In this section, the data samples of three different running states for check valve are selected as experimental datasets, including normal operation state (NC), stuck valve fault (NK), and valve wear fault (NM). The length of data sample for each of operating state is 61,440, and it is decomposed into a 1024 × 60 matrix. Then, a 1024 × 60 × 3 sample matrix in different condition property is constructed. The feature vector T_train is a 117 × 9 training sample matrix, and the test sample matrix T_test is a 63 × 9 feature vector. The category label is a 63 × 1 matrix. NC, NK, and NM are represented by category labels “1, 2, and 3,” respectively.

Experimental results of balanced sample distribution

As shown in Figure 9, the recognition accuracy of six classification models is 87.3016%, 92.0635%, 95.2381%, 82.5397%, 90.4762%, and 93.6508%, respectively. (In KELM or F-KELM model, the optimized parameters C and γ are 97.3364 and 5.5063, respectively.) Comparing Figure 9 with Figure 5, some similar conclusions are drawn as follows: (1) the classification accuracy is gradually improved by combining with kernel learning, fuzzy membership degree matrix, and the sample weight matrix; (2) Figure 9 also shows that the NK fault is easily misclassified as NC or NM. This is mainly because NK is a special fault type between NC and NM, and it is easy to be misclassified when the difference between the signals is not obvious; (3) in the F-KELM model, kernel learning, fuzzy membership degree matrix U, and the sample weight matrix W are introduced to reduce misclassification of NK fault and improve the classification accuracy of the model; (4) as shown in Figure 9, F-KELM can obtain better results than other algorithms including ELM, KELM, BP, SVM, and F-SVM. The experimental results further prove that the proposed F-KELM method is effective in fault diagnosis of check valve and has an important guiding significance for the fault diagnosis of check valve.

Figure 9.

Experimental results of the balanced sample distribution (check valve): (a) ELM, (b) KELM, (c) F-KELM, (d) BP, (e) SVM, and (f) F-SVM.

Experimental results of imbalanced sample distribution

In order to verify the efficiency of the presented method in the case of imbalanced sample distribution, the vibration datasets of the actual high-pressure diaphragm pump check valve are analyzed. The detailed data distribution is shown in Table 5.

Table 5.

Sample attribute of check valve.

Sample attribute	Sample size	Training sample	Training sample	Category labels
NC	1024 × 70	42 × 9	28 × 9	1
NK	1024 × 20	12 × 9	8 × 9	2
NM	1024 × 20	12 × 9	8 × 9	3

Based on the optimized parameters C and γ (the optimization values are 9.1054 and 5.5063, respectively), the recognition results of six different classification models are shown in Figure 10. Compared to the experimental results in Figure 10, it is proved that the proposed method is effective. The proposed method provides a feasible solution for the fault diagnosis of rolling bearing and check valves, and it has a strong practical significance and referential value.

Figure 10.

Experimental results of the imbalanced sample distribution (check valve): (a) ELM, (b) KELM, (c) F-KELM, (d) BP, (e) SVM, and (f) F-SVM.

Similarly, Table 6 gives the statistical results of classification recognition accuracy of six different classification models for check valve in two different sample distributions.

Table 6.

Experimental results of different sample distributions (check valve).

Sample distribution	Accuracy of classification recognition (%)
Sample distribution	ELM	KELM	F-KELM	BP	SVM	F-SVM
Balanced sample distribution	87.3016	92.0635	95.2381	82.5397	90.4762	93.6508
Unbalanced sample distribution	81.8182	86.3636	93.1818	79.5455	86.3636	90.9091

The similar conclusions can be obtained by comparing Table 4 with Table 6. At the same time, it is worth noting that the performance degradation of F-KELM and F-SVM is observed, but the performance degradation of F-SVM is more serious than F-KELM in the process of actual fault diagnosis.

Conclusion

In this article, on the basis of analyzing the limitation of ELM and KELM, combining F-KELM with the feature extraction method based on WPEE, a new fault diagnosis method is proposed. Under the conditions of balanced and imbalanced sample distribution, some significant conclusions can be gained through the aforementioned experimental cases:

The classification performance of six classification models (ELM, KELM, F-KELM, BP, SVM, and F-SVM) in experimental platform (rolling bearing) is better than actual field application (check valve). This is mainly because the simulation data of the experimental platform are less disturbed by environmental noise and other components, and so on. At the same time, this is also the reason why the pre-processing of noise has become the focus and difficulty of the research.

The experimental results show that the proposed F-KELM method combining kernel learning mechanism with membership degree matrix U and the weight matrix W can effectively solve the above-mentioned problems in KELM and original ELM, and gain fairly or slightly better performance than ELM, KELM, BP, SVM, and F-SVM in the classification performance. Furthermore, F-KELM can save the optimization time and improve its actual application performance.

The proposed fault diagnosis method based on WPEE and F-KELM can precisely recognize the operation states of rolling bearing and check valve in different degrees. Therefore, it has a great meaning to repair and replace rolling bearing or check valve.

However, the F-KELM cannot assess the classification performance of the multi-kernel functions. In addition, the hypothesis of the same classification cost is unreasonable in actual application, because the difference of misclassification brings the difference of losing. The effect of classification must take the reduction in misclassification as a standard. Therefore, this will be the research focus of our future research work.

Footnotes

Handling Editor: Dong Wang

Author note

Jun Ma is a postdoctoral fellow.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Founder of China (61663017, 51169007, and 51765022) and Science & Research Program of Yunnan province (2015ZC005).

References

Huang

Song

et al . Trends in extreme learning machines: a review. Neural Networks 2015; 61: 32–48.

Zong

Huang

Chen

. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013; 101: 229–242.

Yang

Huang

Cheng

et al . Haptic identification by ELM-Controlled Uncertain Manipulator. IEEE T Syst Man Cyb 2016; 47: 2398–2409.

Liu

Shi

et al . Extreme support vector machine classifier. Lect Notes Comput Sc 2008; 5012: 222–233.

Huang

Zhu

Siew

et al . Extreme learning machine: theory and applications. Neurocomputing 2006; 70: 489–501.

Huang

Cheng

Chen

et al . Application of rolling bearing for fault diagnosis based an improved extreme learning machine. Mach Des Manuf 2016; 1: 80–83, 87.

Yuan

Zhang

Wang

et al . Fault diagnosis of rolling bearings based on a semi-supervised extreme learning machine. Meas Control Technol 2016; 35: 13–16, 21.

Sheng

Zhao

et al . Fault diagnosis method of wind turbine main bearing based on extreme learning machine. Renew Energ Resour 2016; 34: 1587–1593.

Zhao

Wang

et al . Multi-stage extreme learning machine for fault diagnosis on hydraulic tube tester. Neural Comput Appl 2008; 17: 399–403.

10.

Cheng

Zhu

. An extension sample classification-based extreme learning machine ensemble method for process fault diagnosis. Chem Eng Technol 2014; 37: 911–918.

11.

Sharma

Malik

Khatri

et al . External fault classification experienced by three-phase induction motor based on multi-class ELM. Procedia Comput Sci 2015; 70: 814–820.

12.

Razavi-Far

Saif

Palade

et al . Adaptive incremental ensemble of extreme learning machines for fault diagnosis in induction motors. In: Proceedings of international joint conference on neural networks, Anchorage, AK, 14–19 May 2017, pp.1615–1622. New York: IEEE.

13.

Yang

Pang

Shen

et al . Aero engine fault diagnosis using an optimized extreme learning machine. Int J Aerospace Eng 2016; 2016: 7892875.

14.

Luo

Zhang

et al . Compound feature selection and parameter optimization of ELM for fault diagnosis of rolling element bearings. ISA T 2016; 65: 556–566.

15.

Jiang

Huang

et al . Dual reduced kernel extreme learning machine for aero-engine fault diagnosis. Aerosp Sci Technol 2017; 71: 742–750.

16.

Fan

et al . Fault Diagnosis for Wind Turbine Based on Improved Extreme Learning Machine. J Shanghai Jiao Tong Univ 2017; 22: 466–473.

17.

Chen

Cheng

et al . Intelligent fault diagnosis of photovoltaic arrays based on optimized kernel extreme learning machine and I-V characteristics. Appl Energ 2017; 204: 912–931.

18.

Rauber

Oliveira-Santos

Boldt

et al . Kernel and random extreme learning machine applied to submersible motor pump fault diagnosis. In: Proceedings of international joint conference on neural networks, Anchorage, AK, 14–19 May 2017, pp.3347–3354. New York: IEEE.

19.

Mao

Yan

et al . Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine. Mech Syst Signal Pr 2016; 83: 450–473.

20.

Yao

Yang

Bai

et al . Railway rolling bearing fault diagnosis based on multi-scale intrinsic mode function permutation entropy and extreme learning machine classifier. Adv Mech Eng 2016; 8: 1–9.

21.

Wong

Yang

Vong

et al . Real-time fault diagnosis for gas turbine generator systems using extreme learning machine. Neurocomputing 2014; 128: 249–257.

22.

Zhao

Song

Pan

et al . Retargeting extreme learning machines for classification and their applications to fault diagnosis of aircraft engine. Aerosp Sci Technol 2017; 71: 603–618.

23.

Tian

et al . Rolling bearing fault diagnosis under variable conditions using LMD-SVD and extreme learning machine. Mech Mach Theory 2015; 90: 175–186.

24.

Zhang

Liu

et al . Classification of EEG signals based on autoregressive model and wavelet packet decomposition. Neural Process Lett 2017; 45: 365–378.

25.

Huang

Zhu

Siew

et al . Extreme learning machine: a new learning scheme of feedforward neural network. In: Proceedings of IEEE international joint conference on neural networks, Budapest, 25–29 July 2004, pp.985–990. New York: IEEE.

26.

Doma

AME

. Generalized inverses of matrices and its applications. J Sound Vib 1983; 35: 16–25.

27.

Serre

. Matrices: theory and applications. New York: Springer-Verlag, Inc, 2010.

28.

Huang

Zhou

Ding

et al . Extreme learning machine for regression and multiclass classification. IEEE T Syst Man Cy B 2012; 42: 513–529.

29.

Wang

Liu

Dang

et al . Nonlinear membership function established by single-shot clustering method. J Zhengzhou Univ 2012; 33: 28–30.

30.

Bin

Gao

et al . Early fault diagnosis of rotating machinery based on wavelet packets-Empirical mode decomposition feature extraction and neural network. Mech Syst Signal Pr 2012; 27: 696–711.

31.

Rajeswari

Sathiyabhama

Devendiran

et al . Bearing fault diagnosis using wavelet packet transform, hybrid PSO and support vector machine. Procedia Eng 2014; 97: 1772–1783.

32.

Loparo

. Bearings vibration data set [EB/OL]. Case Western Reserve University, http://csegroups.case.edu/bearingdatacenter/pages/download-data-file

Fault diagnosis method based on wavelet packet-energy entropy and fuzzy kernel extreme learning machine

Abstract

Keywords

Introduction

Background theory

Wavelet packet decomposition

ELM

KELM

Discussion of the F-KELM

Fault diagnosis method based on WPEE and F-KELM

Feature extraction

Construct classification model

Selection of hidden layer neurons

Fuzzy membership degree matrix U

Imbalance weight matrix W and kernel function parameters γ

Performance evaluation

Fault diagnosis of rolling bearing

Dataset description

Feature extraction

Parameter setting of classification model

Experimental results of balanced sample distribution

Experimental results of imbalanced sample distribution

Fault diagnosis of check valve

Dataset description

Experimental results of balanced sample distribution

Experimental results of imbalanced sample distribution

Conclusion

Footnotes

Author note

Declaration of conflicting interests

Funding

References

Imbalance weight matrix W and kernel function parameters $γ$