Abstract
The fault diagnosis of rolling element bearings is very important for ensuring the safe operation of rotary machineries. Targeting the nonstationary characteristics of the vibration signals of rolling element bearings, a novel approach based on dual-tree complex wavelet packet transform, improved intrinsic time-scale decomposition, and the online sequential extreme learning machine is proposed in this article for the fault recognition of rolling element bearing. First, the feature extraction method of the measured signal is presented by combining improved intrinsic time-scale decomposition with dual-tree complex wavelet packet transform as preprocessor and two-step screening processes based on the energy ratio, the vibration signal is adaptively decomposed into a set of proper rotation components; second, the matrix formed by different proper rotation components and singular value decomposition is used to obtain singular value as eigenvector; finally, singular values are input to online sequential extreme learning machine to realize the fault diagnosis of rolling element bearings. The effectiveness of the proposed method of fault diagnosis is demonstrated. The experimental results show that the proposed method can effectively extract the fault characteristics and accurately identify the fault patterns.
Keywords
Introduction
The assessment of the working condition and the fault identification are critically important to make sure the safe operation of the rolling element bearing in rotating machine. Bearing fault detection can be undertaken using different information carriers such as vibration signals, lubricant information, and acoustic and temperature data. 1 Among them, the vibration signal contains abundant fault information. Consequently, vibration-based analysis is widely used for diagnosing fault of rolling element bearing. 2 Numerous analytical methods based on vibration have been presented in the literature for the fault diagnosis of rolling element bearing, which cover time domain and frequency domain. 3 In addition, other solution has been proposed by Villa for bearings diagnostic test. 4 However, the characteristic of the vibration signal of rolling element bearings is nonlinearity and nonstationarity, which makes it very difficult to clearly detect the fault of the rolling element bearing only in the time domain or the frequency domain. 5
Wavelet transform (WT) is a multi-resolution analysis method. It is used widely in the field of machinery fault diagnosis and identification. 6 However, the high-frequency section of the signal cannot be subdivided by WT, in which the modulation information the fault bearing exists. Wavelet package transform (WPT) is an extension of WT which can split the high-frequency band and possesses better time-frequency localization of signals. 7 Second-generation wavelet transform (SGWT) is a new wavelet construction method. 8 Compared with classical WT, SGWT provides an entirely spatial domain interpretation of the transform. The time-frequency resolution of SGWT varies with the decomposition levels. It gives good time and poor frequency resolution at high-frequency sub-band, and good frequency and poor time resolution at low-frequency sub-band. 9 Dual-tree complex wavelet transform (DTCWT) has been presented. 10 DTCWT possesses some good properties such as nearly analytic basis functions and nearly shift-invariance. However, the DTCWT cannot split the high-frequency band. In order to obtain a higher resolution in the high-frequency sub-band, the dual-tree complex wavelet packet transform (DTCWPT) has been constructed and overcomes the shortcoming that DTCWT cannot carry out the decomposition in the high-frequency band, and original signal can be further decomposed to obtain their approximation and detail components. 11
Empirical mode decomposition (EMD) provides a novel method to handle the nonstationary signal in time-frequency domain; it is based on the local characteristic time scales of a signal and could self-adaptively decompose the complicated signal into a set of the intrinsic mode functions (IMFs) and can present the local character better than the wavelet method. 12 However, when the EMD method is applied to the nonstationary signals, the original signal cannot be decomposed accurately because of the problem of mode mixing, the end effects, and the unexplainable negative frequency. By adding the noise to the original signal and calculating the means of the IMFs repeatedly, Wu and Huang 13 developed ensemble empirical mode decomposition (EEMD) to improve EMD. Although the EEMD method has effectively solved the mode-mixing problem, it is time consuming for implementing the large enough ensemble mean. Aimed at some shortcomings of EMD, a self-adaptive signal processing method named local mean decomposition (LMD) was proposed. 14 Some researchers have applied LMD to the fault diagnosis of rotating machine and illustrated that the LMD method is better than the EMD method in mode mixing, end effects, and so on.15–17 However, LMD has its own inherent defects for the signal decomposition. When used in the processing of transient impact signals, the algorithm usually does not converge. Intrinsic time-scale decomposition (ITD) is a novel signal processing method which has high computational efficiency. 18 The signal is decomposed into a set of proper rotation components (PRCs) relied on the local characteristic time-scale. ITD can extract more meaningful features of the vibration signal compared to EMD and LMD.
Artificial neural networks (ANNs) have been widely used in the pattern recognition of machine conditions. 19 However, the ANN methods have some drawbacks, such as local optimal solution, low convergence rate, obvious over-fitting, and poor generalization when the number of samples is limited. 20 Support vector machines (SVMs) have better generalization than ANN for pattern recognition and guarantee that the local and global optimal solutions are exactly the same. 21 However, SVM still cannot provide a perfect solution due to the low sparsity of the model, and the margin trade-off parameter must be estimated. 22 Recently, extreme learning machine (ELM) is improved learning method which developed from single-hidden layer feed-forward neural networks (SLFNs). 23 Different from the traditional feed-forward neural networks algorithms, the ELM method has been shown that ELM has high learning speed and good generalization performance than other traditional methods. Online sequential extreme learning machine (OS-ELM) is an extension method of the ELM, which is integrated with online learning method and the ELM and it can learn samples one-by-one or chunk-by-chunk with fixed or varying sizes. 24 Furthermore, OS-ELM learns faster than other sequential algorithms and possesses better generalization performance on numerous benchmark related to the fault recognition.
In order to extract meaningful fault features and obtain high accuracy of fault classification, a novel hybrid method based on DTCWPT-improved intrinsic time-scale decomposition (IITD), singular value decomposition (SVD), and OS-ELM is presented for multi-fault diagnosis of rolling element bearings. The vibration signal is adaptively decomposed into a number of PRCs by DTCWPT-IITD; two-step screening processes based on the energy ratio are introduced to carry out the screening of PRCs and remove meaningless feature components. The matrix is formed by different PRCs and SVD is used to decompose the matrix to obtain the singular value as eigenvector. Singular values are input to OS-ELM to specify the fault type.
The rest of the article is organized as follows. In section “Fault feature extraction based on proposed DTCWPT-IITD,” the fault feature extraction based on DTCWPT and IITD will be presented. Singular value extraction based on SVD will be described in section “Fault feature extraction based on proposed DTCWPT-IITD.” In section “Fault feature classification based on the OS-ELM,” we use OS-ELM to accomplish the fault pattern classification based on the selected features. Section “Experimental analysis and discussion” will present the experimental results and analysis. Finally, the conclusion is drawn in section “Conclusion.”
Fault feature extraction based on proposed DTCWPT-IITD
Fundamental of DTCWPT
DTCWPT is constructed based on DTCWT by repeatedly decomposing both low-frequency sub-bands and high-frequency bands at each scale. DTCWPT is implemented by two parallel and independent WPT which use two different sets of low-pass and high-pass filters. DTCWPT possesses nearly shift-invariance property and improves the problem of frequency mixing, which make it qualified for detecting bearing fault. The decomposition and reconstruction of DTCWPT are shown in Figure 1.

Decomposition and reconstruction of DTCWPT:
The decomposition of DTCWPT is performed based on wavelet packet transform. The original signal is decomposed by DTCWPT with the analysis filters
R-tree decomposition
I-tree decomposition
where
The reconstruction of DTCWPT is expressed as 22
R-tree reconstruction
I-tree reconstruction
where
Improved ITD method
Given a signal
where
Let
where equation (7) expresses the linear interpolation of
In order to getting a meaningful PRC, the standard deviation (SD) criterion is introduced for PRC.
Then
Finally, the given signal
DTCWPT-IITD method for fault feature extraction
In this method, the combination of IITD with DTCWPT as preprocessor and two-stage screening processes based on energy criterions is called the DTCWPT-IITD. By computing j-level dual-tree complex wavelet packet decomposition, the signal
From
If the value of energy ratio
As a part of this algorithm, energy ratio is important in extracting meaningful signal features. For those with energy ratio

Flowchart of the proposed DTCWPT-IITD–based two-stage screening processes.
Simulation analysis using proposed DTCWPT-IITD
To demonstrate feasibility of the proposed DTCWPT-IITD for extracting the fault feature of rolling element bearings, a simulation signal
The time-domain waveform of simulated signal is given in Figure 3. Hilbert transform (HT) is applied to demodulate the simulated signal directly. Figure 4 depicts corresponding Hilbert spectrum. In theory, there should be two peaks at 20 and 60 Hz in its demodulation spectrum. However, it is failing to recognize 20 and 60 Hz clearly from the Hilbert spectrum because of strong noise which exists in its entire frequency domain. It shows that the direct demodulation is not effective for identifying the fault frequency.

Time-domain waveform of the simulated signal.

Hilbert envelope spectrum of the simulated signal.
In order to demonstrate that DTCWPT-IITD is effective in extracting fault feature, IITD with screening and DTCWPT with screening are used to processing the same signal, respectively. Figures 5 and 6 show processing results. Compared with direct demodulation, they are able to present the modulating frequencies. However, for the processing result of IITD with screening, the noise on the spectral map may result in ambiguous identification results. And the processing result of DTCWPT with screening contains some meaningless information.

Hilbert envelope spectrum by IITD with energy ratio screening.

Hilbert envelope spectrum by DTCWPT with energy ratio screening.
The proposed DTCWPT-IITD is used for extracting the meaningful information from the simulated signal. Figure 7 shows the result. Considering the spectrum by DTCWPT-IITD, DTCWPT with screening, and IITD with screening, the spectrum based on DTCWPT-IITD provides a more legible information about frequency components embedded in the simulated signal. 27 It can be observed that there are two peaks at 20 and 60 Hz which are modulating frequencies from Figure 7. Furthermore, DTCWPT-IITD has successfully eliminated redundant information compared with other method.

Hilbert envelope spectrum by DTCWPT-IITD with energy ratio screening.
Fault singular value extraction based on SVD
In the matrix theory, singular values generated by SVD present the inherent feature of matrix and possess the characteristics of scale invariance, rotating invariance, and favorable stability. 1 Therefore, the singular values of the matrix whose rows are desired PRCs are very feasible to be the feature vector for ELM training and testing.
The matrix consist of a set of desired PRCs is divided into two initial feature matrices
If
Let
where
Then the initial feature matrices A and B are processed by the SVD, respectively; the obtained singular values
where
Fault feature classification based on the OS-ELM
Suppose
where
where
where
For the efficiency of sequential learning, it is reasonable to express
For sequential learning, when the (k + 1)th new chunk of data arrives, the recursive method is implemented for acquiring the updated solution.
28
Experimental analysis and discussion
In this section, the actual experiment about rolling element bearings fault identification is carried out for the purpose of further verifying the feasibility and effectiveness of the proposed method. The vibration data of rolling element bearings are provided by Case Western Reserve University (CWRU). 30 The test stand is composed of three-phase induction motor, a torque transducer/encoder, and a dynamometer. Fault locations cover inner race, outer race, and rolling element, and the sizes of the defect are 0.007, 0.014, 0.021, and 0.028 in. Vibration data were collected at the sampling frequency of 12 kHz. According to the geometric parameters of bearing, the characteristic frequencies of three types defect are gained. It is 141.17 Hz for rolling element defect, 162.19 Hz for inner race defect, and 107.37 Hz for outer race defect.
The defect with the diameters of 0.014 in is chosen to test and verify the validity of the presented method. Figure 8 shows the different bearing health conditions (i.e. normal, outer race defect, inner race defect, and rolling element defect), as well as their corresponding fast fourier transform (FFT) spectrum. It is hard to estimate the conditions of rolling bearing from these time-domain waveforms and corresponding spectral maps, as shown in Figure 8.

Waveform of four conditions of bearing and their FFT spectra: (a, b) normal, (c, d) outer race defect, (e, f) inner race defect, and (g, h) rolling element defect.
For examining the effectiveness of the proposed method, the method using Hilbert transform directly, specified as HT, will be applied for comparison. Another comparison is with respect to the DTCWPT and IITD but without using two-step screening, designed as DTCWPT-IITD without screening. Figure 9 shows the result for a healthy condition using the HT, DTCWPT-IITD without screening, and proposed method. The characteristic frequency of normal rolling element bearings is decided by the shaft rotational speed (

The results for normal using different methods: (a) HT, (b) DTCWPT-IITD without screening, and (c) DTCWPT-IITD.
Figure 10 depicts processing results of the HT, DTCWPT-IITD without screening, and DTCWPT-IITD for a failure of outer race. In this situation, characteristic frequency should be 107.37 Hz (

The results for defect of outer race using three methods: (a) HT, (b) DTCWPT-IITD without screening, and (c) DTCWPT-IITD.

The results for defect of inner race using three methods: (a) HT, (b) DTCWPT-IITD without screening, and (c) DTCWPT-IITD.
Results of the HT, DTCWPT-IITD without screening, and DTCWPT-IITD for the defect of rolling element are presented in Figure 12. The characteristic frequency

The results for rolling element defect using different methods: (a) HT, (b) DTCWPT-IITD without screening, and (c) DTCWPT-IITD.
In order to provide an intuitional distinguish result, OS-ELM is used to diagnose the bearing condition after DTCWPT-IITD. Through analyzing the desired PRCs obtained by DTCWPT-IITD, it is found that these PRCs contain main working condition information. Hence, the matrix whose rows are desired PRCs of one sample is suitable to be the original feature matrix. Next, this feature matrix is processed by SVD to produce the feature vector whose elements are singular values. Thus, four matrixes which consist of the feature vectors are generated for four different conditions of bearing.
The feature vectors are distributed into training samples and testing samples. Twenty groups of the feature vectors, from four different conditions of bearing, respectively, are chosen to be trained and the rest 20 groups of four bearing conditions are adopted to be test. Table 1 shows the distributions of these data sets in detail. Trained mathematical prediction models are used to perform the classification procedure for test samples, and then the type of bearing working states is identified based on the output result of OS-ELM classifier. The node number in the hidden layer of 25 is chosen. Excitation function is a sigmoid function and training mode is one-by-one.
The faults identification of based on relevant method.
IITD: improved intrinsic time-scale decomposition; OS-ELM: online sequential extreme learning machine; DTCWPT: dual-tree complex wavelet packet transform; WPT: wavelet package transform; ITD: intrinsic time-scale decomposition.
For all the data sets, it can be noted from Table 2 that the proposed method which is composed of DTCWPT-IITD, SVD, and OS-ELM provides better classification accuracy (99.44%, 99.42%, 99.36%) compared with DTCWPT (96.25%, 96.19%, 91.75%) and IITD (96.11%, 96.14%, 91.67%). The DTCWPT-IITD combines advantage of DTCWPT and IITD, and it eliminates most of redundant features based on two-stage screening processes additionally. Hence, the proposed method possesses ability to extract the feature of bearing conditions, which is significantly stronger than that of traditional methods.
The average classification accuracy of OS-ELM based on relevant method.
IITD: improved intrinsic time-scale decomposition; OS-ELM: online sequential extreme learning machine; DTCWPT: dual-tree complex wavelet packet transform; WPT: wavelet package transform; ITD: intrinsic time-scale decomposition.
Moreover, the test accuracy of DTCWPT-IITD + OS-ELM (99.44%, 99.42%, 99.36%) is higher than those of the WPT-IITD (97.75%, 97.81%, 94.97%) and DTCWPT-IITD (97.92%, 97.92%, 95.31%). It can be explained by the fact that DTCWPT possesses nearly shift-invariance and improves the problem of frequency aliasing, as opposed to the traditional WPT. Similarly, IITD overcomes the problem of interpolation method and decomposition termination condition which remain in the ITD method. Therefore, the extracted features using DTCWPT-IITD and SVD are more typical than extracted features using WPT-IITD and DTCWPT-ITD.
The classification accuracies of data sets C generated using all the methods (range from 91.67% to 99.36%) are lower than that of data set A or B. It is because the class number of data sets C (seven classes) is larger than A or B (four classes). With the increase in the number of fault features, this method is more prominent in the classification accuracy. It is clear from Table 1 that the detection class number is 7, and the performance of DTCWPT-IITD + OS-ELM is significant better than that of other methods. In the case of data set A or B, where number of class is 4, the classification accuracies of proposed method are also higher than the other methods but it is not as significant as in data set C.
In order to assure generalization capabilities of the proposed methods, four groups of experiments are carried out for further studying of the application of the proposed method in the faults identification. A data set containing 160 data samples is used to implement the classification, the data set is split difference 20, 40, 60, and 80 samples for training and the calculation procedure is the same as above. Table 3 shows the influences of the number of training samples on the identification accuracy. It can be seen from Table 3 that the larger number of training samples yields higher identification accuracy, and the highest classification accuracy (100%) is arrived when the size of training sets is 60. It can also be found that the proposed method can still classify the four conditions of rolling bearing after the training samples are decreased, which confirms that this method can be applied successfully to the faults identification even in cases where only limited training samples are available. In addition, when the training sample is 10 and the test sample is 10, the classification accuracy obtained using the proposed method is 89.5%.
The results of faults identification using different training sets and testing sets.
IITD: improved intrinsic time-scale decomposition; OS-ELM: online sequential extreme learning machine; DTCWPT: dual-tree complex wavelet packet transform; SVD: singular value decomposition.
For the purpose of evaluating the ability of the OS-ELM, the classification accuracies of OS-ELM implemented in different chunk-by-chunk training modes are compared with those of the original batch ELM and ANN, respectively. For the chunk-by-chunk training mode, constant chunk sizes of 1 and 5 as well as a randomly changing chunk size between [20, 140] are considered. Table 3 shows the average classification accuracy and training time.
From Table 4, it is clear that the original batch ELM classifier provides better classification performance than the back-propagation neural network algorithm (BP) in classification with the same feature extraction. It is also worthy to draw attention to the training time. It can be found that the ELM algorithm run around 50 times faster than BP in training. Results reveal that ELM provides better generalization performance at higher learning speed compared with traditional BP algorithms. Hence, the combination of the DTCWPT-IITD and ELM can get outstanding classification results.
Performance comparison of BP, ELM, and OS-ELM.
OS-ELM: online sequential extreme learning machine; ELM: extreme learning machine; BP: back-propagation neural network algorithm.
The ELM and OS-ELM with different training modes have similar classification accuracies. Considering the training time, the sequential operation of one-by-one takes the longest. And with the chunk size increasing, the training speed of OS-ELM increases. As shown in Table 4, the training time reduces from 0.1716 to 0.1404 s with the chunk size increasing from 1 to 5. From the comparison result, it can be realized that OS-ELM can be implemented to suit the way the data arrive without sacrificing the accuracy.
The above proposed solutions have been tested on data sets of rolling element bearings provided by Case Western Reserve Lab, containing roughly equal and different percentages of bearing failure instances and instances of correct performance. In practice, however, it is not possible to obtain imbalanced data sets under real operating conditions. 31 How to choose and identify the most suitable classification technique is an open issue. The further analysis of different metrics for the comparison of classification techniques on imbalanced data sets will be studied in the future work. The influence of the level of imbalance in the data set between fault and good working conditions will be discussed.
Conclusion
This article proposed a novel approach based on DTCWPT-IITD, SVD, and OS-ELM for identification of bearing working conditions. The extraction of meaningful features from original signal is implemented by combining IITD with DTCWPT as preprocessor and two-step screening processes based on the energy ratio from the measured signal, OS-ELM is used for identifying the running state of rolling element bearings. The availability of the proposed fault diagnosis technique is verified by applying it to experiment of simulation and actual measured signals under different states. In the fault feature extraction, the different from the results of DTCWPT or IITD method, in which exist many redundant components and most of the components are highly corrupted with noise, the DTCWPT-IITD technique can effectively remove noise of the signal and extract the meaning frequency feature of the signals. The comparison of the classification accuracy between proposed method and related method is also given. It can be observed that the proposed method based on DTCWPT-IITD, SVD, and OS-ELM provides better classification accuracy.
Footnotes
Handling Editor: Elsa de Sa Caetano
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the National Science Foundation of China (grant no. 51577007), the State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources (grant no. LAPS15019), and the Fundamental Research Foundations for the Central Universities (grant no. 2014JBZ017).
