A fault diagnosis approach for rolling element bearings based on dual-tree complex wavelet packet transform-improved intrinsic time-scale decomposition,singular value decomposition,and online sequential extreme learning machine

Abstract

The fault diagnosis of rolling element bearings is very important for ensuring the safe operation of rotary machineries. Targeting the nonstationary characteristics of the vibration signals of rolling element bearings, a novel approach based on dual-tree complex wavelet packet transform, improved intrinsic time-scale decomposition, and the online sequential extreme learning machine is proposed in this article for the fault recognition of rolling element bearing. First, the feature extraction method of the measured signal is presented by combining improved intrinsic time-scale decomposition with dual-tree complex wavelet packet transform as preprocessor and two-step screening processes based on the energy ratio, the vibration signal is adaptively decomposed into a set of proper rotation components; second, the matrix formed by different proper rotation components and singular value decomposition is used to obtain singular value as eigenvector; finally, singular values are input to online sequential extreme learning machine to realize the fault diagnosis of rolling element bearings. The effectiveness of the proposed method of fault diagnosis is demonstrated. The experimental results show that the proposed method can effectively extract the fault characteristics and accurately identify the fault patterns.

Keywords

Fault diagnosis rolling element bearings dual-tree complex wavelet packet transform intrinsic time-scale decomposition singular value decomposition online sequential extreme learning machine

Introduction

The assessment of the working condition and the fault identification are critically important to make sure the safe operation of the rolling element bearing in rotating machine. Bearing fault detection can be undertaken using different information carriers such as vibration signals, lubricant information, and acoustic and temperature data.¹ Among them, the vibration signal contains abundant fault information. Consequently, vibration-based analysis is widely used for diagnosing fault of rolling element bearing.² Numerous analytical methods based on vibration have been presented in the literature for the fault diagnosis of rolling element bearing, which cover time domain and frequency domain.³ In addition, other solution has been proposed by Villa for bearings diagnostic test.⁴ However, the characteristic of the vibration signal of rolling element bearings is nonlinearity and nonstationarity, which makes it very difficult to clearly detect the fault of the rolling element bearing only in the time domain or the frequency domain.⁵

Wavelet transform (WT) is a multi-resolution analysis method. It is used widely in the field of machinery fault diagnosis and identification.⁶ However, the high-frequency section of the signal cannot be subdivided by WT, in which the modulation information the fault bearing exists. Wavelet package transform (WPT) is an extension of WT which can split the high-frequency band and possesses better time-frequency localization of signals.⁷ Second-generation wavelet transform (SGWT) is a new wavelet construction method.⁸ Compared with classical WT, SGWT provides an entirely spatial domain interpretation of the transform. The time-frequency resolution of SGWT varies with the decomposition levels. It gives good time and poor frequency resolution at high-frequency sub-band, and good frequency and poor time resolution at low-frequency sub-band.⁹ Dual-tree complex wavelet transform (DTCWT) has been presented.¹⁰ DTCWT possesses some good properties such as nearly analytic basis functions and nearly shift-invariance. However, the DTCWT cannot split the high-frequency band. In order to obtain a higher resolution in the high-frequency sub-band, the dual-tree complex wavelet packet transform (DTCWPT) has been constructed and overcomes the shortcoming that DTCWT cannot carry out the decomposition in the high-frequency band, and original signal can be further decomposed to obtain their approximation and detail components.¹¹

Empirical mode decomposition (EMD) provides a novel method to handle the nonstationary signal in time-frequency domain; it is based on the local characteristic time scales of a signal and could self-adaptively decompose the complicated signal into a set of the intrinsic mode functions (IMFs) and can present the local character better than the wavelet method.¹² However, when the EMD method is applied to the nonstationary signals, the original signal cannot be decomposed accurately because of the problem of mode mixing, the end effects, and the unexplainable negative frequency. By adding the noise to the original signal and calculating the means of the IMFs repeatedly, Wu and Huang¹³ developed ensemble empirical mode decomposition (EEMD) to improve EMD. Although the EEMD method has effectively solved the mode-mixing problem, it is time consuming for implementing the large enough ensemble mean. Aimed at some shortcomings of EMD, a self-adaptive signal processing method named local mean decomposition (LMD) was proposed.¹⁴ Some researchers have applied LMD to the fault diagnosis of rotating machine and illustrated that the LMD method is better than the EMD method in mode mixing, end effects, and so on.^15–17 However, LMD has its own inherent defects for the signal decomposition. When used in the processing of transient impact signals, the algorithm usually does not converge. Intrinsic time-scale decomposition (ITD) is a novel signal processing method which has high computational efficiency.¹⁸ The signal is decomposed into a set of proper rotation components (PRCs) relied on the local characteristic time-scale. ITD can extract more meaningful features of the vibration signal compared to EMD and LMD.

Artificial neural networks (ANNs) have been widely used in the pattern recognition of machine conditions.¹⁹ However, the ANN methods have some drawbacks, such as local optimal solution, low convergence rate, obvious over-fitting, and poor generalization when the number of samples is limited.²⁰ Support vector machines (SVMs) have better generalization than ANN for pattern recognition and guarantee that the local and global optimal solutions are exactly the same.²¹ However, SVM still cannot provide a perfect solution due to the low sparsity of the model, and the margin trade-off parameter must be estimated.²² Recently, extreme learning machine (ELM) is improved learning method which developed from single-hidden layer feed-forward neural networks (SLFNs).²³ Different from the traditional feed-forward neural networks algorithms, the ELM method has been shown that ELM has high learning speed and good generalization performance than other traditional methods. Online sequential extreme learning machine (OS-ELM) is an extension method of the ELM, which is integrated with online learning method and the ELM and it can learn samples one-by-one or chunk-by-chunk with fixed or varying sizes.²⁴ Furthermore, OS-ELM learns faster than other sequential algorithms and possesses better generalization performance on numerous benchmark related to the fault recognition.

In order to extract meaningful fault features and obtain high accuracy of fault classification, a novel hybrid method based on DTCWPT-improved intrinsic time-scale decomposition (IITD), singular value decomposition (SVD), and OS-ELM is presented for multi-fault diagnosis of rolling element bearings. The vibration signal is adaptively decomposed into a number of PRCs by DTCWPT-IITD; two-step screening processes based on the energy ratio are introduced to carry out the screening of PRCs and remove meaningless feature components. The matrix is formed by different PRCs and SVD is used to decompose the matrix to obtain the singular value as eigenvector. Singular values are input to OS-ELM to specify the fault type.

The rest of the article is organized as follows. In section “Fault feature extraction based on proposed DTCWPT-IITD,” the fault feature extraction based on DTCWPT and IITD will be presented. Singular value extraction based on SVD will be described in section “Fault feature extraction based on proposed DTCWPT-IITD.” In section “Fault feature classification based on the OS-ELM,” we use OS-ELM to accomplish the fault pattern classification based on the selected features. Section “Experimental analysis and discussion” will present the experimental results and analysis. Finally, the conclusion is drawn in section “Conclusion.”

Fault feature extraction based on proposed DTCWPT-IITD

Fundamental of DTCWPT

DTCWPT is constructed based on DTCWT by repeatedly decomposing both low-frequency sub-bands and high-frequency bands at each scale. DTCWPT is implemented by two parallel and independent WPT which use two different sets of low-pass and high-pass filters. DTCWPT possesses nearly shift-invariance property and improves the problem of frequency mixing, which make it qualified for detecting bearing fault. The decomposition and reconstruction of DTCWPT are shown in Figure 1.

Figure 1.

Decomposition and reconstruction of DTCWPT: ${h_{1 - 1}, h_{0 - 1}}$ and ${g_{1 - 1}, g_{0 - 1}}$ are the analysis filters at first-level decomposition; ${h'_{1 - 1}, h'_{0 - 1}}$ and ${g'_{1 - 1}, g'_{0 - 1}}$ are the reconstruction filters at first-level reconstruction; 2↑ represents the process of keeping one sample out of two; 2↑ represents the sampling of increasing point.

The decomposition of DTCWPT is performed based on wavelet packet transform. The original signal is decomposed by DTCWPT with the analysis filters ${h_{1}, h_{0}}$ and ${g_{1}, g_{0}}$ . The decomposition of real-tree and image-tree is described as follows:¹¹

R-tree decomposition

{\begin{matrix} p_{j + 1, 2 m}^{Re} (k) = \sum_{n \in Z} h_{0} (n - 2 k) p_{j, m}^{Re} (n) \\ p_{j + 1, 2 m + 1}^{Re} (k) = \sum_{n \in Z} h_{1} (n - 2 k) p_{j . m}^{Re} (n) \end{matrix}

(1)

I-tree decomposition

{\begin{matrix} p_{j + 1, 2 m}^{Im} (k) = \sum_{l \in Z} g_{0} (l - 2 k) p_{j, m}^{Im} (l) \\ p_{j + 1, 2 m + 1}^{Im} (k) = \sum_{l \in Z} g_{1} (l - 2 k) p_{j, m}^{Im} (l) \end{matrix}

(2)

where $p_{j, m}^{Re} (k)$ and $p_{j, m}^{Im} (k)$ represent the real-tree coefficients and image-tree coefficients of the mth node at scale $j$ . The coefficients of dual-tree wavelet packet can be written as

p_{j, m} (k) = p_{j, m}^{Re} (k) + i p_{j, m}^{Im} (k)

(3)

The reconstruction of DTCWPT is expressed as²²

R-tree reconstruction

p_{j, m}^{Re} (k) = \sum_{n \in N} [p_{j + 1, 2 m}^{Re} (n) {h^{'}}_{0} (k - 2 n) + p_{j + 1, 2 m + 1}^{Re} (n) {h^{'}}_{1} (k - 2 n)]

(4)

I-tree reconstruction

p_{j, m}^{Im} (k) = \sum_{l \in N} [p_{j + 1, 2 m}^{Im} (l) {g^{'}}_{0} (k - 2 l) + p_{j + 1, 2 m + 1}^{Re} (l) {g^{'}}_{1} (k - 2 l)]

(5)

where ${h'_{1}, h'_{0}}$ and ${g'_{1}, g'_{0}}$ are the reconstruction filters.

Improved ITD method

Given a signal $X_{t}$ , $L$ is defined as the baseline extraction operator, which extracts the baseline component from $X_{t}$ in a manner that causes the residual to be a proper rotation.²⁵ $X_{t}$ can be decomposed as

X_{t} = L X_{t} + (1 - L) X_{t} = L_{t} + H_{t}

(6)

where $L_{t} = L X_{t}$ is the baseline signal and $H_{t} = (1 - L) X_{t}$ is a proper rotation.

Let $T_{k}$ represent the local extrema of $X_{t}$ , and all local extreme points $(T_{k}, X_{k})$ $(k = 1, 2, \dots M)$ are identified. If the extremal value of some successive data points is equivalent, $T_{k}$ is regarded as the time of the rightmost of these extremal values.²⁵ For any three successive extreme points $(T_{k}, X_{k})$ , $(T_{k + 1}, X_{k + 1})$ , and $(T_{k + 2}, X_{k + 2})$ , the baseline control point $L_{k + 1}$ can be expressed as follows²²

L_{k + 1} = \frac{1}{2} {X_{k + 1} + [X_{k} + (\frac{T_{k + 1} + T_{k}}{T_{k + 2} - T_{k}}) (X_{k + 2} - X_{k})]}, k = 1, 2, …, M

(7)

where equation (7) expresses the linear interpolation of $(T_{k}, X_{k})$ and $(T_{k + 2}, X_{k + 2})$ at time $T_{k + 1}$ . It can be seen from equation (7) that the range of subscript of $L_{k}$ is [2, $M - 1$ ]. To extend the edge of the data points, mirror method is adopted. $L_{1}$ and $L_{M}$ can be calculated according to equation (7). The baseline signal $L_{11} (t)$ is constructed using cubic spline interpolation. A new $H_{11} (t)$ can be expressed

H_{11} (t) = X_{t} - L_{11} (t)

(8)

In order to getting a meaningful PRC, the standard deviation (SD) criterion is introduced for PRC. $H_{11} (t)$ is perceived as the first PRC and the above computations are repeated $k$ times, until $H_{1 k} (t)$ satisfies the SD criterion. $H_{1 k} (t)$ is regarded as the first PRC of original signal and is denoted as $PR C_{1} (t)$ . The new signal $u_{1} (t)$ can be given as follows

u_{1} (t) = X_{t} - P R C_{1} (t)

(9)

Then $u_{1} (t)$ is viewed as the initial signal and repeats the above computations until the rate $d$ is more than 0.99²⁶

d = \frac{\sum_{i = 1}^{p} V a r (P R C_{i} (t))}{\sum_{i = 1}^{p} V a r (P R C_{i} (t)) + V a r (u_{n} (t))}

(10)

Finally, the given signal $X_{t}$ can be expressed as the sum of PRCs and a residual signal $u_{p} (t)$

X_{t} = \sum_{i = 1}^{p} P R C_{i} (t) + u_{p} (t)

(11)

DTCWPT-IITD method for fault feature extraction

In this method, the combination of IITD with DTCWPT as preprocessor and two-stage screening processes based on energy criterions is called the DTCWPT-IITD. By computing j-level dual-tree complex wavelet packet decomposition, the signal $x (t)$ is decomposed into $2^{j}$ sets of coefficients $C_{j, m}$ with the length of $N / 2^{j}$ .²⁷ And the sequence of j-level coefficients is $m = 0, 1, 2, \dots, 2^{j} - 1$ . The decomposition coefficients $C_{j, m}$ can be given

C_{j, m} = {p_{j, m} (k) = p_{j, m}^{Re} (k) + i p_{j, m}^{Im} (k) | k = 1, 2, …, N / 2^{j}}

(12)

From $C_{j, m}$ , a reconstructed signal $p_{j}^{m} (t)$ of length $N$ can be obtained by implementing the reconstruction of DTCWPT. In reconstruction procedure, the reconstructed coefficients are reserved, and then other coefficients in level $j$ are set to be zeroes. Then, to assess the significance of feature packets, first-level screening processes which adopt the energy ratio $L_{j}^{m}$ between $v_{j, m}^{2} (t)$ and signal $x^{2} (t)$ are introduced

L_{j}^{m} = \frac{\int v_{j, m}^{2} (t) d t}{\int x^{2} (t) d t} = \frac{\int {(x (t) - p_{j}^{m} (t))}^{2} d t}{\int x^{2} (t) d t}

(13)

If the value of energy ratio $L_{j}^{m}$ in first level is less than 0.99, corresponding $p_{j}^{m} (t)$ will be regarded as a meaningful signal feature and stored in data sets $p_{i} (t)$ of the selected feature packets. If not, corresponding $p_{j}^{m} (t)$ will be eliminated because it is considered as an unnecessary signal feature. After the first-level screening processes are performed, the selected feature packets $p_{i} (t)$ will be decomposed by the process of the IITD. Subsequently, the obtained PRCs from each selected feature packet are saved as the preselected PRCs data sets $C_{i, n} (t)$ . Next, in order to pick out the meaningful PRCs from the $C_{i, n} (t)$ which comprise useful information of the original signal $x (t)$ , the second-level screening processes are introduced. The energy ratio $L_{i, n}$ as expressed in equation (14) is used in the second-level screening processes²⁷

L_{i, n} = \frac{\int v_{j, m}^{2} (t) d t}{\int {p_{i}}^{2} (t) d t} = \frac{\int {(p_{i} (t) - c_{i, n} (t))}^{2} d t}{\int {p_{i}}^{2} (t) d t}

(14)

As a part of this algorithm, energy ratio is important in extracting meaningful signal features. For those with energy ratio $L_{i, n}$ are less than 0.99, their corresponding $C_{i, n} (t)$ is recognized as a meaningful PRC and stored in the data sets $C_{g} (t)$ of PRCs.²⁷ If not, corresponding PRC will be eliminated because it is considered as an unmeaningful PRC. The flowchart of the proposed DTCWPT-IITD–based two stages screening processes is summarized in Figure 2.

Figure 2.

Flowchart of the proposed DTCWPT-IITD–based two-stage screening processes.

Simulation analysis using proposed DTCWPT-IITD

To demonstrate feasibility of the proposed DTCWPT-IITD for extracting the fault feature of rolling element bearings, a simulation signal $x (t)$ is constructed. The simulated signal is sampled at 1024 Hz, and it comprises two carrier frequency components (150 and 400 Hz) which are modulated with 20 and 60 Hz. It is necessary to add a white noise into original signal, and ratio of signal to noise is set to be −4 dB. The simulated signal $x (t)$ is represented as

x (t) = {\begin{matrix} (1.5 + \sin (2 π 20 t_{1}) \cdot \sin (2 π 150 t_{1})) & 0 \leq t_{1} < 0.5 \\ (1.5 + \sin (2 π 60 t_{2}) \cdot \sin (2 π 400 t_{2})) & 0.5 \leq t_{2} < 1 \\ W h i t e n o i s e (S / N = - 4 dB) \end{matrix}

(15)

The time-domain waveform of simulated signal is given in Figure 3. Hilbert transform (HT) is applied to demodulate the simulated signal directly. Figure 4 depicts corresponding Hilbert spectrum. In theory, there should be two peaks at 20 and 60 Hz in its demodulation spectrum. However, it is failing to recognize 20 and 60 Hz clearly from the Hilbert spectrum because of strong noise which exists in its entire frequency domain. It shows that the direct demodulation is not effective for identifying the fault frequency.

Figure 3.

Time-domain waveform of the simulated signal.

Figure 4.

Hilbert envelope spectrum of the simulated signal.

In order to demonstrate that DTCWPT-IITD is effective in extracting fault feature, IITD with screening and DTCWPT with screening are used to processing the same signal, respectively. Figures 5 and 6 show processing results. Compared with direct demodulation, they are able to present the modulating frequencies. However, for the processing result of IITD with screening, the noise on the spectral map may result in ambiguous identification results. And the processing result of DTCWPT with screening contains some meaningless information.

Figure 5.

Hilbert envelope spectrum by IITD with energy ratio screening.

Figure 6.

Hilbert envelope spectrum by DTCWPT with energy ratio screening.

The proposed DTCWPT-IITD is used for extracting the meaningful information from the simulated signal. Figure 7 shows the result. Considering the spectrum by DTCWPT-IITD, DTCWPT with screening, and IITD with screening, the spectrum based on DTCWPT-IITD provides a more legible information about frequency components embedded in the simulated signal.²⁷ It can be observed that there are two peaks at 20 and 60 Hz which are modulating frequencies from Figure 7. Furthermore, DTCWPT-IITD has successfully eliminated redundant information compared with other method.

Figure 7.

Hilbert envelope spectrum by DTCWPT-IITD with energy ratio screening.

Fault singular value extraction based on SVD

In the matrix theory, singular values generated by SVD present the inherent feature of matrix and possess the characteristics of scale invariance, rotating invariance, and favorable stability.¹ Therefore, the singular values of the matrix whose rows are desired PRCs are very feasible to be the feature vector for ELM training and testing.

The matrix consist of a set of desired PRCs is divided into two initial feature matrices $A$ and $B$ , they are described respectively by

\begin{matrix} A = {[\begin{matrix} c_{1} & c_{2} & \dots & c_{J} \end{matrix}]}^{T} \\ B = {[\begin{matrix} c_{J + 1} & c_{J + 2} & \dots & c_{S} \end{matrix}]}^{T} \end{matrix}

(16)

If $S$ is even, $J \leq S / 2$ . Otherwise, $J \leq (S + 1) / 2$ . The substantive features of the disturbing signal are characterized by the matrices $A$ and $B$ .

Let $C$ be a real matrix with $N$ rows and $M$ columns. There exist orthogonal matrices $U$ and $V$ with the sizes of $N \times N$ and $M \times M$ such that²²

C = U Λ V^{T}

(17)

where $Λ$ is a $N \times M$ diagonal matrix with nonnegative diagonal elements. These diagonal elements $σ_{i} (i = 1, 2, \dots, p) (p = min (N, M))$ arranged in descending order are termed the singular values of the matrix $C$ .

Then the initial feature matrices A and B are processed by the SVD, respectively; the obtained singular values $σ_{A, j}$ and $σ_{B, j}$ are expressed as follows

σ_{A, j} = [σ_{A, j}^{1}, σ_{A, j}^{2}, \dots, σ_{A, j}^{J}]

(18)

σ_{B, j} = [σ_{B, j}^{J + 1}, σ_{B, j}^{J + 2}, \dots, σ_{B, j}^{S}]

(19)

where $σ_{A, j}^{1} \geq σ_{A, j}^{2} \geq \dots \geq σ_{A, j}^{J}$ , $σ_{B, j}^{J + 1} \geq σ_{B, j}^{J + 2} \geq \dots \geq σ_{B, j}^{S}$ . The vector $[σ_{A, j}, σ_{B, j}]$ is chosen as the feature vector.

Fault feature classification based on the OS-ELM

Suppose $X_{0} = {(x_{i}, t_{i})}_{i = 1}^{N}$ is an initial training sample set, $H_{0}$ is the output matrix of hidden layer with the size of $N \times L$ , and $T_{0}$ denotes the $N \times m$ target matrix. At the beginning, if the number of hidden nodes $L$ is less than or equals the number of samples $N$ , the output weights on the basis of the condition min $‖ H_{0} β - T_{0} ‖$ can be written as²⁸

β_{0} = K_{0}^{- 1} H_{0}^{T} T_{0}

(20)

where $K_{0} = (H_{0}^{T} H_{0})$ . Suppose that a new incoming training sample set $X_{1} = {(x_{i}, t_{i})}_{i = N_{0} + 1}^{N_{0} + N_{1}}$ will be added, where $N_{1}$ is the number of sample in $X_{1}$ . The new solution from training $X_{0}$ and $X_{1}$ can be satisfied the following equation

min_{β} ‖ [\begin{matrix} H_{0} \\ H_{1} \end{matrix}] β - [\begin{matrix} T_{0} \\ T_{1} \end{matrix}] ‖

(21)

where $H_{1}$ presents the output matrix of hidden layer with regard to $X_{1}$ , the size of $X_{1}$ is $N_{1} \times L$ , and $T_{1}$ is a $N_{1} \times m$ target matrix of $X_{1}$ . Thus, the output weights can be defined as follows

β_{1} = K_{1}^{- 1} {[\begin{matrix} H_{0} \\ H_{1} \end{matrix}]}^{T} [\begin{matrix} T_{0} \\ T_{1} \end{matrix}]

(22)

where

K_{1} = {[\begin{matrix} H_{0} \\ H_{1} \end{matrix}]}^{T} [\begin{matrix} H_{0} \\ H_{1} \end{matrix}] = K_{0} + H_{1}^{T} H_{1}

For the efficiency of sequential learning, it is reasonable to express $β_{1}$ as a function of $β_{0}$ , $K_{1}$ , $H_{1}$ , and $T_{1}$ , which is independent of the original data set

\begin{array}{l} {[\begin{matrix} H_{0} \\ H_{1} \end{matrix}]}^{T} [\begin{matrix} T_{0} \\ T_{1} \end{matrix}] = H_{0}^{T} T_{0} + H_{1}^{T} T_{1} \\ = K_{1} ω^{(0)} - H_{1}^{T} H_{1} ω^{(0)} + H_{1}^{T} T_{1} \end{array}

(23)

$β_{1}$ can be expressed as follows by combining equations (21) and (23)

β_{1} = K_{1}^{- 1} {[\begin{matrix} H_{0} \\ H_{1} \end{matrix}]}^{T} [\begin{matrix} T_{0} \\ T_{1} \end{matrix}] = β_{0} + K_{1}^{- 1} H_{1}^{T} (T_{1} - H_{1} β_{0})

(24)

For sequential learning, when the (k + 1)th new chunk of data arrives, the recursive method is implemented for acquiring the updated solution.²⁸ $K_{K + 1}$ and the output weights $β_{K + 1}$ can be updated by²⁹

K_{K + 1} = K + H_{K + 1}^{T} H_{K + 1}

(25)

β_{K + 1} = β_{K} + K_{K + 1}^{- 1} H_{K + 1}^{T} (T_{K + 1} - H_{K + 1} β_{K})

(26)

Experimental analysis and discussion

In this section, the actual experiment about rolling element bearings fault identification is carried out for the purpose of further verifying the feasibility and effectiveness of the proposed method. The vibration data of rolling element bearings are provided by Case Western Reserve University (CWRU).³⁰ The test stand is composed of three-phase induction motor, a torque transducer/encoder, and a dynamometer. Fault locations cover inner race, outer race, and rolling element, and the sizes of the defect are 0.007, 0.014, 0.021, and 0.028 in. Vibration data were collected at the sampling frequency of 12 kHz. According to the geometric parameters of bearing, the characteristic frequencies of three types defect are gained. It is 141.17 Hz for rolling element defect, 162.19 Hz for inner race defect, and 107.37 Hz for outer race defect.

The defect with the diameters of 0.014 in is chosen to test and verify the validity of the presented method. Figure 8 shows the different bearing health conditions (i.e. normal, outer race defect, inner race defect, and rolling element defect), as well as their corresponding fast fourier transform (FFT) spectrum. It is hard to estimate the conditions of rolling bearing from these time-domain waveforms and corresponding spectral maps, as shown in Figure 8.

Figure 8.

Waveform of four conditions of bearing and their FFT spectra: (a, b) normal, (c, d) outer race defect, (e, f) inner race defect, and (g, h) rolling element defect.

For examining the effectiveness of the proposed method, the method using Hilbert transform directly, specified as HT, will be applied for comparison. Another comparison is with respect to the DTCWPT and IITD but without using two-step screening, designed as DTCWPT-IITD without screening. Figure 9 shows the result for a healthy condition using the HT, DTCWPT-IITD without screening, and proposed method. The characteristic frequency of normal rolling element bearings is decided by the shaft rotational speed ( $f_{N} = 30 Hz$ ). From Figure 9, it can be observed that the characteristic frequency of normal condition exists in all processing results of these related methods. Nevertheless, the proposed DTCWPT-IITD method (Figure 9(c)) provides the best spectral resolution due to its efficient noise reduction and PRCs selection in which the energy ratio is used to protrude feature characteristics.

Figure 9.

The results for normal using different methods: (a) HT, (b) DTCWPT-IITD without screening, and (c) DTCWPT-IITD.

Figure 10 depicts processing results of the HT, DTCWPT-IITD without screening, and DTCWPT-IITD for a failure of outer race. In this situation, characteristic frequency should be 107.37 Hz ( $f_{O} = 107.37 Hz$ ). It can be recognized that there exists characteristic frequency in results of each method. Compared with HT and DTCWPT-IITD without screening in Figure 10(a) and (b), the DTCWPT-IITD is able to protrude characteristic components and remove the noise hidden in the signal availably. Figure 11 presents processing results of three methods for a failure of inner race. Characteristic frequency should be 162.19 Hz ( $f_{I} = 162.19 Hz$ ). It can be realized that the result of DTCWPT-IITD in which characteristic frequency is dominant is superior to that of other related methods.

Figure 10.

The results for defect of outer race using three methods: (a) HT, (b) DTCWPT-IITD without screening, and (c) DTCWPT-IITD.

Figure 11.

The results for defect of inner race using three methods: (a) HT, (b) DTCWPT-IITD without screening, and (c) DTCWPT-IITD.

Results of the HT, DTCWPT-IITD without screening, and DTCWPT-IITD for the defect of rolling element are presented in Figure 12. The characteristic frequency $f_{R}$ is 141.17 Hz. The fault diagnosis of rolling elements may be the most difficult because of its characteristic of nonstationary and nonlinear. The proposed DTCWPT-IITD provides the most legible information of the frequency components in Figure 12(c) because of its effective noise reduction and PRC selection using the proposed energy ratio.

Figure 12.

The results for rolling element defect using different methods: (a) HT, (b) DTCWPT-IITD without screening, and (c) DTCWPT-IITD.

In order to provide an intuitional distinguish result, OS-ELM is used to diagnose the bearing condition after DTCWPT-IITD. Through analyzing the desired PRCs obtained by DTCWPT-IITD, it is found that these PRCs contain main working condition information. Hence, the matrix whose rows are desired PRCs of one sample is suitable to be the original feature matrix. Next, this feature matrix is processed by SVD to produce the feature vector whose elements are singular values. Thus, four matrixes which consist of the feature vectors are generated for four different conditions of bearing.

The feature vectors are distributed into training samples and testing samples. Twenty groups of the feature vectors, from four different conditions of bearing, respectively, are chosen to be trained and the rest 20 groups of four bearing conditions are adopted to be test. Table 1 shows the distributions of these data sets in detail. Trained mathematical prediction models are used to perform the classification procedure for test samples, and then the type of bearing working states is identified based on the output result of OS-ELM classifier. The node number in the hidden layer of 25 is chosen. Excitation function is a sigmoid function and training mode is one-by-one.

Table 1.

The faults identification of based on relevant method.

Data set	The number of training samples	The number of testing samples	Defect size (in)	Operating condition	Label of classification
A	20	20	0	Normal	1
	20	20	0.007	Outer ring	2
	20	20	0.007	Inner ring	3
	20	20	0.007	Rolling element	4
B	20	20	0	Normal	1
	20	20	0.021	Outer ring	2
	20	20	0.021	Inner ring	3
	20	20	0.021	Rolling element	4
C	20	20	0	Normal	1
	20	20	0.007	Outer ring	2
	20	20	0.007	Inner ring	3
	20	20	0.007	Rolling element	4
	20	20	0.021	Outer ring	5
	20	20	0.021	Inner ring	6
	20	20	0.021	Rolling element	7

IITD: improved intrinsic time-scale decomposition; OS-ELM: online sequential extreme learning machine; DTCWPT: dual-tree complex wavelet packet transform; WPT: wavelet package transform; ITD: intrinsic time-scale decomposition.

For all the data sets, it can be noted from Table 2 that the proposed method which is composed of DTCWPT-IITD, SVD, and OS-ELM provides better classification accuracy (99.44%, 99.42%, 99.36%) compared with DTCWPT (96.25%, 96.19%, 91.75%) and IITD (96.11%, 96.14%, 91.67%). The DTCWPT-IITD combines advantage of DTCWPT and IITD, and it eliminates most of redundant features based on two-stage screening processes additionally. Hence, the proposed method possesses ability to extract the feature of bearing conditions, which is significantly stronger than that of traditional methods.

Table 2.

The average classification accuracy of OS-ELM based on relevant method.

Algorithms	Classification accuracy of data set A (%)	Classification accuracy of data set B (%)	Classification accuracy of data set C (%)
IITD + OS-ELM	96.11	96.14	91.67
DTCWPT + OS-ELM	96.25	96.19	91.75
WPT-IITD + OS-ELM	97.75	97.81	94.97
DTCWPT-ITD + OS-ELM	97.92	97.92	95.31
DTCWPT-IITD + OS-ELM	99.44	99.42	99.36

Moreover, the test accuracy of DTCWPT-IITD + OS-ELM (99.44%, 99.42%, 99.36%) is higher than those of the WPT-IITD (97.75%, 97.81%, 94.97%) and DTCWPT-IITD (97.92%, 97.92%, 95.31%). It can be explained by the fact that DTCWPT possesses nearly shift-invariance and improves the problem of frequency aliasing, as opposed to the traditional WPT. Similarly, IITD overcomes the problem of interpolation method and decomposition termination condition which remain in the ITD method. Therefore, the extracted features using DTCWPT-IITD and SVD are more typical than extracted features using WPT-IITD and DTCWPT-ITD.

The classification accuracies of data sets C generated using all the methods (range from 91.67% to 99.36%) are lower than that of data set A or B. It is because the class number of data sets C (seven classes) is larger than A or B (four classes). With the increase in the number of fault features, this method is more prominent in the classification accuracy. It is clear from Table 1 that the detection class number is 7, and the performance of DTCWPT-IITD + OS-ELM is significant better than that of other methods. In the case of data set A or B, where number of class is 4, the classification accuracies of proposed method are also higher than the other methods but it is not as significant as in data set C.

In order to assure generalization capabilities of the proposed methods, four groups of experiments are carried out for further studying of the application of the proposed method in the faults identification. A data set containing 160 data samples is used to implement the classification, the data set is split difference 20, 40, 60, and 80 samples for training and the calculation procedure is the same as above. Table 3 shows the influences of the number of training samples on the identification accuracy. It can be seen from Table 3 that the larger number of training samples yields higher identification accuracy, and the highest classification accuracy (100%) is arrived when the size of training sets is 60. It can also be found that the proposed method can still classify the four conditions of rolling bearing after the training samples are decreased, which confirms that this method can be applied successfully to the faults identification even in cases where only limited training samples are available. In addition, when the training sample is 10 and the test sample is 10, the classification accuracy obtained using the proposed method is 89.5%.

Table 3.

The results of faults identification using different training sets and testing sets.

Methods	Training sample	Test sample	Overall classification accuracy (%)
DTCWPT-IITD + SVD + OS-ELM	20	140	93.53
	40	120	98.02
	60	100	100
	80	80	100

IITD: improved intrinsic time-scale decomposition; OS-ELM: online sequential extreme learning machine; DTCWPT: dual-tree complex wavelet packet transform; SVD: singular value decomposition.

For the purpose of evaluating the ability of the OS-ELM, the classification accuracies of OS-ELM implemented in different chunk-by-chunk training modes are compared with those of the original batch ELM and ANN, respectively. For the chunk-by-chunk training mode, constant chunk sizes of 1 and 5 as well as a randomly changing chunk size between [20, 140] are considered. Table 3 shows the average classification accuracy and training time.

From Table 4, it is clear that the original batch ELM classifier provides better classification performance than the back-propagation neural network algorithm (BP) in classification with the same feature extraction. It is also worthy to draw attention to the training time. It can be found that the ELM algorithm run around 50 times faster than BP in training. Results reveal that ELM provides better generalization performance at higher learning speed compared with traditional BP algorithms. Hence, the combination of the DTCWPT-IITD and ELM can get outstanding classification results.

Table 4.

Performance comparison of BP, ELM, and OS-ELM.

Data sets	Algorithms	Training mode	Time (s)	Accuracy (%)
A	BP	Batch	3.2412	93.06
	ELM	Batch	0.0702	99.08
	OS-ELM	1-by-1	0.1716	99.03
		5-by-5	0.1404	98.75
		[10, 20]	0.1036	99.07
B	BP	Batch	3.2378	92.78
	ELM	Batch	0.0736	99.01
	OS-ELM	1-by-1	0.1748	98.82
		5-by-5	0.1432	98.75
		[10, 20]	0.1072	98.92
C	BP	Batch	5.4110	89.86
	ELM	Batch	0.0936	98.89
	OS-ELM	1-by-1	0.2140	98.71
		5-by-5	0.1592	98.56
		[10, 20]	0.1248	98.79

OS-ELM: online sequential extreme learning machine; ELM: extreme learning machine; BP: back-propagation neural network algorithm.

The ELM and OS-ELM with different training modes have similar classification accuracies. Considering the training time, the sequential operation of one-by-one takes the longest. And with the chunk size increasing, the training speed of OS-ELM increases. As shown in Table 4, the training time reduces from 0.1716 to 0.1404 s with the chunk size increasing from 1 to 5. From the comparison result, it can be realized that OS-ELM can be implemented to suit the way the data arrive without sacrificing the accuracy.

The above proposed solutions have been tested on data sets of rolling element bearings provided by Case Western Reserve Lab, containing roughly equal and different percentages of bearing failure instances and instances of correct performance. In practice, however, it is not possible to obtain imbalanced data sets under real operating conditions.³¹ How to choose and identify the most suitable classification technique is an open issue. The further analysis of different metrics for the comparison of classification techniques on imbalanced data sets will be studied in the future work. The influence of the level of imbalance in the data set between fault and good working conditions will be discussed.

Conclusion

This article proposed a novel approach based on DTCWPT-IITD, SVD, and OS-ELM for identification of bearing working conditions. The extraction of meaningful features from original signal is implemented by combining IITD with DTCWPT as preprocessor and two-step screening processes based on the energy ratio from the measured signal, OS-ELM is used for identifying the running state of rolling element bearings. The availability of the proposed fault diagnosis technique is verified by applying it to experiment of simulation and actual measured signals under different states. In the fault feature extraction, the different from the results of DTCWPT or IITD method, in which exist many redundant components and most of the components are highly corrupted with noise, the DTCWPT-IITD technique can effectively remove noise of the signal and extract the meaning frequency feature of the signals. The comparison of the classification accuracy between proposed method and related method is also given. It can be observed that the proposed method based on DTCWPT-IITD, SVD, and OS-ELM provides better classification accuracy.

Footnotes

Handling Editor: Elsa de Sa Caetano

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the National Science Foundation of China (grant no. 51577007), the State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources (grant no. LAPS15019), and the Fundamental Research Foundations for the Central Universities (grant no. 2014JBZ017).

References

Sui

Osman

Wang

. An adaptive envelope spectrum technique for bearing fault detection. Meas Sci Technol 2014; 25: 095004.

Zhang

Zhou

. Multi-fault diagnosis for rolling element bearings based on ensemble empirical mode decomposition and optimized support vector machines. Mech Syst Signal Pr 2013; 41: 127–140.

Jardine

AKS

Lin

Banjevic

. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech Syst Signal Pr 2006; 20: 1483–1510.

Villa

Reñones

Perán

et al . Angular resampling for vibration analysis in wind turbines under non-linear speed fluctuation. Mech Syst Signal Pr 2011; 25: 2157–2168.

Liu

Huang

. Adaptive feature extraction using sparse coding for machinery fault diagnosis. Mech Syst Signal Pr 2011; 25: 558–574.

Peng

Chu

. Application of the wavelet transform in machine condition monitoring and fault diagnostics: a review with bibliography. Mech Syst Signal Pr 2004; 18: 199–221.

Fan

Zuo

. Gearbox fault detection using Hilbert and wavelet packet transform. Mech Syst Signal Pr 2006; 20: 966–982.

Sweldens

. The lifting scheme: a construction of second generation wavelets. Siam J Math Anal 1998; 29: 511–546.

Zhou

et al . Mechanical fault diagnosis based on redundant second generation wavelet packet transform, neighborhood rough set and support vector machine. Mech Syst Signal Pr 2012; 28: 608–621.

10.

Kingsbury

. The dual-tree complex wavelet transform: a new technique for shift invariance and directional filters. Image Process 1998; 86: 319–322.

11.

Weickert

Benjaminsen

Kiencke

. Analytic wavelet packets: combining the dual-tree approach with wavelet packets for signal analysis and filtering. IEEE T Signal Proces 2009; 57: 493–502.

12.

Huang

Shen

Long

et al . The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. P Roy Soc A-Math Phy 1998; 454: 903–995.

13.

Huang

. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv Adapt Data Anal 2009; 1: 1–41.

14.

Smith

. The local mean decomposition and its application to EEG perception data. J R Soc Interface 2005; 2: 443–454.

15.

Chen

et al . A demodulating approach based on local mean decomposition and its applications in mechanical fault diagnosis. Meas Sci Technol 2011; 22: 055704.

16.

Park

Looney

Van Hulle

et al . The complex local mean decomposition. Neurocomputing 2011; 74: 867–875.

17.

Liu

Zhang

Han

et al . A new wind turbine fault diagnosis method based on the local mean decomposition. Renew Energ 2012; 48: 411–415.

18.

Jiang

Chen

et al . Application of the intrinsic time-scale decomposition method to fault diagnosis of wind turbine bearing. J Vib Control 2012; 18: 240–245.

19.

Porteiro

Collazo

Patiño

et al . Diesel engine condition monitoring using a multi-net neural network system with nonintrusive sensors. Appl Therm Eng 2011; 31: 4097–4105.

20.

Samanta

Al-Balushi

Al-Araimi

. Artificial neural networks and support vector machines with genetic algorithm for bearing fault detection. Eng Appl Artif Intel 2003; 16: 657–665.

21.

Hao

Peng

Feng

et al . Application of support vector machine based on pattern spectrum entropy in fault diagnostics of rolling element bearings. Meas Sci Technol 2011; 22: 045708.

22.

Junhong

Fengrong

et al . A fault diagnosis approach for diesel engine valve train based on improved ITD and SDAG-RVM. Meas Sci Technol 2014; 26: 025003.

23.

Huang

Zhu

Siew

. Extreme learning machine: theory and applications. Neurocomputing 2006; 70: 489–501.

24.

Liang

Huang

Saratchandran

et al . A fast and accurate online sequential learning algorithm for feedforward networks. IEEE T Neural Networ 2006; 17: 1411–1423.

25.

Yang

Pan

et al . A roller bearing fault diagnosis method based on the improved ITD and RRVPMCD. Measurement 2014; 55: 255–264.

26.

Liu

Zhang

. A fault diagnosis approach for diesel engines based on self-adaptive WVD, improved FCBF and PECOC-RVM. Neurocomputing 2016; 177: 600–611.

27.

Law

Kim

Liew

WYH

et al . An approach to monitoring the thermomechanical behavior of a spindle bearing system using acoustic emission (AE) energy. Int J Precis Eng Man 2013; 14: 1169–1175.

28.

Yin

Wang

. An online sequential extreme learning machine for tidal prediction based on improved Gath-Geva fuzzy segmentation. Neurocomputing 2016; 174: 85–98.

29.

Sun

Yuan

Wang

. An OS-ELM based distributed ensemble classification framework in P2P networks. Neurocomputing 2016; 74: 2438–2443.

30.

http://csegroups.case.edu/bearingdatacenter

31.

Santos

Maudes

Bustillo

. Identifying maximum imbalance in datasets for fault diagnosis of gearboxes. J Intell Manuf. Epub ahead of print 15 June 2015. DOI: https://doi.org/10.1007/s10845-015-1110-0.