A novel deep learning diagnosis scheme for rotating machinery using adaptive local iterative filtering and ensemble hierarchical extreme learning machine

Abstract

There are many hyper-parameters to be tuned in both machine learning model and deep learning model, and the structure of the deep learning model is large and complicated, making it extremely difficult for fault-feature extraction and classification. In order to address these two problems, a deep learning diagnosis method is proposed in this study by combining adaptive local iterative filtering and ensemble hierarchical extreme learning machine. Adaptive local iterative filtering and entropy feature matrix are used to extract fault features, and an ensemble hierarchical extreme learning machine with deep learning architecture is proposed for unsupervised feature learning and supervised classification. The proposed deep learning diagnosis scheme is tested on fault benchmark datasets under different severity conditions to verify its effectiveness and accuracy. The test results show that the proposed method performs better than traditional extreme learning machine and other variants.

Keywords

Fault diagnosis unsupervised feature extraction adaptive local iterative filtering extreme learning machine deep learning

Introduction

The bearings and other components of a rotating machine often fail in harsh environments, which can result in unexpected breakdown and considerable productivity losses.¹ A prompt and accurate diagnosis of fault type and severity is essential to prevent components from catastrophic failures and to optimize maintenance schedules. The intelligent fault diagnosis model plays a key role in machine safety operation and has become a hot research topic in recent years.²

Fault signals usually exhibit nonstationary and nonlinear characteristics due to the nonlinearity of components, such as stiffness and clearances.³ Conventional linear signal processing algorithms, such as kurtosis, skewness, and fast Fourier transform, may not effectively characterize fault information embedded in nonstationary and nonlinear signals. Hence, an advanced algorithm with good resolution performance in both time and frequency domains is the crux of fault signal processing and feature extraction.⁴ The wavelet analysis theory (WT) is a well-known nonlinear signal processing method with a wide range of applications.^5,6 A potential shortcoming of WT is that the decomposition scales rest on the designated wavelet basis function and it is not signal-adaptive. However, this can be overcome in another signal decomposition method called empirical mode decomposition (EMD). EMD is designed with a data-adaptive scheme to extract a set of AM-FM (Amplitude Modulation–Frequency Modulation) simple components named intrinsic mode functions (IMFs) from original vibration signals.⁷ The iteration process of EMD is to find an intrinsic oscillatory mode represented by IMF contained in the signal. EMD and its variants perform better in extracting sensitive fault symptoms than WT.^8,9 However, EMD also has some shortcomings, such as mode aliasing and endpoint effects. A novel method, adaptive local iterative filtering (ALIF), uses smooth low pass filters to decompose signals to avoid the shortcomings of EMD, and the length of smooth low pass filters is constructed adaptively by solving Fokker–Planck second-order partial differential equation.^10,11 These filters satisfy the derived sufficient conditions for the convergence of the iterative filtering (IF) algorithm. Therefore, more accurate data about the components of a nonstationary signal can be obtained, and it also suppresses mode mixing. Moreover, the ALIF method has been shown to outperform EMD in fault-feature extraction.^12–14

Another important issue in fault diagnosis is the construction of fault classifiers, which can be achieved by many advanced intelligent classification methods, such as Bayesian network,¹⁵ neural network,⁸ and support vector machine (SVM).¹⁶ However, these methods based on the shallow learning theory often have some problems. Thus, much prior domain knowledge is often needed, and the representative capability of extracted features could decrease when the fault data contain different data sources or working conditions.¹⁷ Therefore, feature learning and extraction using unsupervised manner have become a hot research topic.^18–20

Recently, the deep learning (DL) model has been proposed to unsupervised learn optimal feature by extracting high-level features from low-level features.²¹ In DL, nonlinear transformations are performed from the previous layer to the next layer, and the back-propagation algorithm is used to optimize the classification model and can approximate complex nonlinear functions with small training errors. DL has been widely used in fault diagnosis. Janssens et al.²² used convolutional neural networks (CNNs) to autonomously learn fault features from raw data in several types of bearing faults. Lu et al.²³ investigated stacked denoising auto-encoder (SDA) for rotating machinery fault detection. Shao and colleagues^24,25 proposed a novel deep auto-encoder fault diagnosis method for gear box and motor locomotive rolling bearing and applied improved convolutional deep belief network (CDBN) with compressed sensing (CS) for feature learning and fault diagnosis of rolling bearing. Tamilselvan and Wang²⁶ proposed an aircraft engine fault diagnosis method based on deep belief network (DBN). He et al.²⁷ applied an improved DBN for fault diagnosis of gear transmission chain. Qi et al.²⁸ proposed an induction motor fault classification model based on the sparse auto-encoder-based deep neural network (DNN).

Current DL-based fault diagnosis methods, although powerful, still have several apparent shortcomings: (1) there are many hyper-parameters in the DL model because of its deep and large structures, and the diagnosis results are sensitive to these hyper-parameters,^29,30 and (2) the structures of DL model are large and complicated. Thus, many weights and parameters need to be tuned in the training phase, and thus, much training time is needed to tackle massive fault data. Extreme learning machine (ELM), as a mighty fault classification algorithm method, has a remarkable performance and fast training speed.³¹ Unlike traditional methods, the input weights and hidden biases of ELM do not need to be tuned and randomly chosen, and the generalization performance could be improved.³² A new H-ELM-based DL structure was proposed with multi-auto-encoder hidden layers. H-ELM used an auto-encoder feature learning method. Different from traditional DL method, the output of hidden layers in H-ELM does not need train model in the greedy layer-wise manner, and the optimal solution can also be obtained by one-shot learning. Thus, a much shorter training time is needed for H-ELM than for other DL algorithms.³³ Nevertheless, H-ELM sometimes shows unstable and nonoptimal classification results due to the randomness of assigned parameters.^34,35

In this research, we proposed an intelligent fault diagnosis model to tackle the above problems. Initially, ALIF was used to decompose fault signals and extract fault features. Then, the popular entropy-based fault indicator, permutation entropy (PE) and sample entropy (SampEn), was used to obtain the fault feature of each IMF component. An ensemble hierarchical ELM with DL architectures was proposed to achieve unsupervised feature learning and supervised classification. Then, a set of H-ELM was assembled to improve its stability, which was referred to as ensemble hierarchical extreme learning machine (EH-ELM) in this article. ALIF and entropy feature matrix were used to extract fault features. A DL diagnosis method was proposed in this article by combining ALIF and EH-ELM. In addition, the effectiveness and accuracy of the proposed DL diagnosis scheme were tested on fault benchmark datasets under different severity conditions. The results showed that the proposed method performed better than traditional ELM and other variants.

The main contributions of this article are summarized as follows. (1) As a powerful time-frequency analysis technique, the ALIF method is used to obtain multi-scale SampEn and PE features. (2) An ELM-based DL diagnosis model is proposed to improve the diagnosis performance. The diagnosis model could unsupervised learn the sensitive features with multi-auto-encoder hidden layers. Compared with the traditional DL model, the proposed model requires no tuning of input weights and determination of fewer hyper-parameters, making it more suitable for practical engineering problems. Moreover, the ensemble strategy is used to improve the stability of the diagnosis model.

The rest part of this article is organized as follows. The “Background” section describes the theory of this article; “System framework and procedures” section describes the proposed EH-ELM model and the diagnosis scheme; “Performance and result analysis” section describes the effectiveness of the proposed diagnosis model; and main conclusions are drawn in “Conclusions” section.

Background

ALIF

The ALIF method is based on the IF technique,¹⁰ and it can improve the performance of IF by adaptively calculating the moving average of signals.¹¹ Inspired by the EMD, the ALIF method contains two loops: the inner iteration produces a single IMF and the outer iteration captures all IMFs. The updating steps of ALIF are described as follows:

Given a nonstationary signal $y = f (x)$ , $x \in R$ , the moving average operator could be calculated by the convolution between the signal $f (x)$ and $ω_{i, j} (t)$ by solving the second Fokker–Planck equations

f_{n + 1} (x) = f_{n} (x) - \int_{- (x)}^{(x)} f_{n} (x + t) ω_{i, j} (x, t) dt

(1)

where $ω_{i, j} (t)$ is a fixed filter with mask length 2 $l_{i, j} (y)$ , and $l_{i, j} (y)$ could be calculated by equation (2)

l_{i, j} (y) = 2 [τ \frac{S}{m}]

(2)

where S is the sample point number of signal $f (x)$ , m is the number of its maximum and minimum points, and $τ$ is a constant number, which is usually fixed to be around 1.6.¹¹ Then, the fluctuation part of signal $f (x)$ could be given as

χ_{i, j} (f_{n}) = f_{n} - K_{i, j} (f_{n})

(3)

If $χ_{i, j} (y (t))$ meets the stopping criterion of IMF, the maximal number of iterations reaches the limit number for all inner loops, the $χ_{i, j} (y (t))$ could be the IMF component extracted by inner iteration.

In the outer iteration, the previous process should be applied to the remainder signal $r = f - I_{1} - \dots I_{k - 1}$ , $k \in N$ , to produce the kth IMF, and ALIF will be terminated when the stopping criterion is met. A detailed convergence theorem of the ALIF method is given in Cicone et al.¹¹ The flowchart of ALIF is given in Figure 1.

Figure 1.

The flowchart of ALIF.

PE

PE is proposed as a randomness measure tool for nonstationary time series, which can characterize the complexity of local order structure for complex system analysis. A low PE value means that the time series has regularity. Once impact fault occurs, the regularity of time series changes and the PE value increases. A PE value of N sample points ${x (t), t = 1, 2, \dots N}$ could be computed as described in Yan et al.³⁶ A m-dimensional phase space reconstruction method is applied to this time series, and the reconstructed results $X (t)$ are given as follows

X (t) = {x (t), x (t + τ), \dots, x (t + (m - 1) τ)}

(4)

where $τ$ is the time delay and m is the embedded dimension. Then, the symbols series with m sample points are acquired from each $X (t)$ as

S (l) = (j_{1}, j_{2}, \dots j_{m})

(5)

where $S (l)$ is a symbol permutation by sorting $X (t)$ values in an increasing order. For K symbol sequences, the PE value could be calculated as follows

H_{p} (m) = - \frac{1}{\ln (m!)} \sum_{j = 1}^{K} p_{j} \ln (p_{j})

(6)

where the factor $1 / \ln (m!)$ is a normalization factor such that $0 \leq H_{p} / \ln (m!) \leq 1$ . $p_{1}, p_{2}, \dots p_{k}$ is the probability distribution of each sequence, respectively.

SampEn

SampEn is a complex statistical measure for the regularity of time series.³⁷ Considering the time series ${x (t), t = 1, 2, \dots N}$ , its time delay embedding representation is

\begin{matrix} x_{m} (i) = [x (i), x (i + 1), \dots x (i + m - 1)], i = 1, 2, \dots N - m + 1 \end{matrix}

(7)

where m is the compared sequence length.

The distance between two such vectors is defined as

\begin{matrix} d_{m} [x_{m} (i), x_{m} (j)] = max [x_{m} (i + k) - x_{m} (j + k)], \\ 0 \leq k \leq m - 1 \end{matrix}

(8)

Then, SampEn is defined as

{\begin{matrix} SampEn (m, r, N) = - \ln [\frac{A^{m} (r)}{B^{m} (r)}] \\ A^{m} (r) = \frac{1}{N - m} \sum_{i = 1}^{N - m} \frac{1}{N - m + 1} w^{m} (i), i = 1, 2, \dots N - m + 1 \\ B^{m} (r) = \frac{1}{N - m} \sum_{i = 1}^{N - m} \frac{1}{N - m + 1} v^{m} (i), i = 1, 2, \dots N - m + 1 \end{matrix}

(9)

where r is the tolerance for accepting matrices; $A^{m} (r)$ and $B^{m} (r)$ are the probabilities of similarity measure indicator between two sequences at m + 1 and m points; and $v^{m}$ and $w^{m}$ are the number of $d_{m} [x_{m} (i), x_{m} (j)] \leq r, i \neq j$ and $d_{m + 1} [x_{m + 1} (i), x_{m + 1} (j)]$ $\leq r, i \neq j$ , respectively.

H-ELM

ELM is a novel feed-forward network with a single hidden layer, which has the advantage of fast learning.³¹ Assuming that there are N data samples ${(x_{i}, t_{i})}_{i = 1}^{N}$ , where $x_{i} \in R^{d}$ and $t_{i} \in R^{m}$ , the output from the ELM could be calculated as

T = H β

(10)

H = (\begin{matrix} w_{1} x_{1} + b_{1} & \dots & w_{l} x_{1} + b_{1} \\ ⋮ & ⋱ & ⋮ \\ w_{l} x_{1} + b_{1} & \dots & w_{l} x_{1} + b_{1} \end{matrix})

(11)

where β is the output vector of weight, T is the input training sample, and H is the hidden layer matrix.

The loss function of ELM is designed to combine training error and output weight norms, which could improve the generalization ability of ELM. Hence, β is calculated analytically as follows

β = H^{†} T

(12)

where $H^{†}$ is the Moore–Penrose pseudo-inverse matrix of H.

H-ELM is designed to expand the ELM algorithm to the DL structure.³³ The training architecture of H-ELM contains two separate stages: unsupervised feature learning and supervised classification. In the first stage, the input vector is mapped into the random feature space of ELM. In addition, the high-level sparse features are obtained through the N-layer unsupervised learning. The output vector of the hidden layer could be described mathematically

H_{i} = g (H_{i - 1}, β)

(13)

where $H_{i}$ and $H_{i - 1}$ are the outputs and the ith hidden layer and its the previous layer ( $i \in [1, K]$ ), g(·) is the activation function of the hidden layers, and $β$ is the output weight.

An ELM-based $L_{1}$ optimization and sparse auto-encoder is employed to extract multi-layer sparse features of the train sample, which tries to make the reconstructed data samples as similar as itself and obtain more useful hidden information. H-ELM is applied to minimize the objective function described as follows

O_{β} = \underset{β}{\arg min} {{‖ H β - X ‖}^{2} + {‖ β ‖}_{l_{1}}}

(14)

where X is the input vector, and $‖ β ‖_{l_{1}}$ is the penalty term of the training model using the $L_{1}$ -norm. The FISTA algorithm is used in H-ELM to solve the problem described in equation (14).³³

The final fault classification is computed by the original ELM using the learning feature matrix. The multi-layer structure of H-ELM is shown in Figure 2.

Figure 2.

The DL structure of H-ELM.

Ensemble method by majority voting

Due to the simplicity of H-ELM and random parameter selection, H-ELM will probably show low generalization ability in dealing with complex fault features under noisy environment and various working loads. In this article, an ensemble method is applied to improve the classification accuracy of H-ELM, which is named as EH-ELM. The majority voting has been widely applied in different ensemble learning methods due to its convenience and intelligibility. For a N-class diagnosis problem, a fault sample j should have N results which are calculated by the H-ELM classifier, and an ensemble of H-ELM classifier is constructed. The final results are combined to obtain the final diagnosis result via majority voting. More specifically, the class that receives the highest votes is considered as the predicted label, and the total vote received by each class is calculated as

T_{c} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} r_{ij}^{n}, n = 1, \dots, N

(15)

where $r_{ij}^{n}$ is set to 1 if the predicted label is the same as class n; otherwise, it is set to 0. I is the H-ELM classifier number, J is the total fault sample number, and $T_{c}$ is the final output of EH-ELM.

System framework and procedures

A novel DL diagnosis scheme is proposed in this section, which consists of five main steps. In order to evaluate the performance of the developed DNN-based fault diagnosis system in feature extraction and identification, four pretreatment schemes are considered.

Step 1: The vibration fault data of machine components are collected by condition monitoring system, and ALIF is used to obtain IMFs with different frequency sub-band according to equations (1)–(3).

Step 2: The feature matrix of the first m IMFs is constructed by calculating all permutation and SampEn of sub-signals decomposed by ALIF according to equations (4)–(9).

Step 3: Various H-ELM models are initialized with the same structure and hidden node number according to equations (10)–(14). Then, the training and testing datasets are constructed from original datasets as predetermined percentage, and all H-ELM models were performed on the training process.

Step 4: The majority voting is used to ensemble the outputs of these H-ELMs, and the final result is the highest number of votes.

Step 5: Finally, an EH-ELM classier is used to identify fault types of test samples. The overall flow diagram is shown in Figure 3.

Figure 3.

The flow diagram of the proposed diagnosis scheme.

Performance and result analysis

Experiment description

The EH–ELM diagnosis model was applied on Case Western Reserve University (CWRU) fault diagnosis benchmark datasets.³⁸ The rolling element bearings test stand consists of an induction motor supported on the left, a torque transducer/encoder at the center, and a dynamometer on the right. Figure 4 shows the test rig. All numerical computations were implemented in MATLAB 2015b on a laptop computer with Intel Core i7 CPU.

Figure 4.

The bearing fault test rig.

We chose fault signals with a defect size of 0.007, 0.014, and 0.021 at 0 hp load and 48,000 Hz. To construct the feature matrix, the length of sub-signal was set to 4096. Figure 5 shows the waveform of nine fault types.

Figure 5.

The waveform of nine faults.

Each sub-signal was decomposed by ALIF. Figure 6 shows the inner race fault waveform decomposed sub-signals. The first 10 IMFs which reached 95% of the total accumulation energy of the original signal were chosen to construct the feature matrix. And each IMF signal was calculated to obtain permutation and SampEn to represent the fault type and severity.

Figure 6.

ALIF decomposed results.

Three cases of experiments were carried out to evaluate the performance of the proposed method based on the above experimental results. Case 1 has three bearing fault types, including inner race fault, outer race fault, and ball fault with a severity level of 0.007 in and 0.014 in; Case 2 has the same bearing fault types as Case 1, except that the severity levels are set to 0.007 in and 0.021 in; and Case 3 has all the nine classes faults. In each experiment, the dataset was sorted randomly, part of which was selected as training data, and the rest of which was selected as testing data. The description of each experiment is shown in Table 1.

Table 1.

The detailed setting of each experiment.

Case	Fault label	Operating condition	Label of classification	Defect size (inches)	The number of trainingsamples	The numberof testing samples
1	IR007	Inner race	1	0.007	500	209
	IR014	Inner race	2	0.014
	OR007	Outer race	3	0.007
	OR014	Outer race	4	0.014
	B007	Ball	5	0.007
	B014	Ball	6	0.014
2	IR007	Inner race	1	0.007	500	210
	IR021	Inner race	2	0.021
	OR007	Outer race	3	0.007
	OR021	Outer race	4	0.021
	B007	Ball	5	0.007
	B021	Ball	6	0.021
3	IR007	Inner race	1	0.007	800	265
	IR014	Inner race	2	0.014
	IR021	Inner race	3	0.021
	OR007	Outer race	4	0.007
	OR014	Outer race	5	0.014
	OR021	Outer race	6	0.021
	B007	Ball	7	0.007
	B014	Ball	8	0.014
	B021	Ball	9	0.021

Result and analysis

In this section, the performance of ALIF entropy features is verified and compared with the traditional EMD and energy feature. Moreover, the feature matrices extracted by two methods were classified by original ELM,³² EE-ELM,³² and DE-ELM³⁴ for bearing fault diagnosis. All of these methods have been applied in the same test experiment under similar working conditions. The proposed EH-ELM consists of 10 H-ELM classifiers with the same structure, and each H-ELM has two hidden layers with 40 nodes, yielding a total node number of 400. The node parameters of ELM, EE-ELM, and DE-ELM in Zhou et al.³⁴ are the same, and the hidden node number is 400. The average accuracy and their corresponding standard deviations of 30 trials are employed to indicate the performance of EH-ELM and other classifiers. Given the final output of fault categories ${x_{1}, x_{2}, \dots, x_{n}}$ , where N is the test sample number, the average accuracy and standard deviations are given as follows

\bar{x} = \sum_{i = 1}^{n} \frac{x_{i}}{N}

(16)

s = \sqrt{\frac{{(x_{1} - \bar{x})}^{2} + {(x_{2} - \bar{x})}^{2} + \dots + {(x_{n} - \bar{x})}^{2}}{n}}

(17)

Table 2 and Figure 7 show that all classifiers using ALIF with entropy features show higher classification accuracy than the method with EMD and energy. In the three tests, the diagnosis accuracy is 98.80%, 99.85%, and 97.66% for the proposed method, and 96.33%, 99.60%, and 94.55% for the EMD + Energy method, respectively. Other classifiers show similar results.

Table 2.

Classification results of different algorithms.

	ALIF +PE and SampEn			EMD + Energy
	Test1	Test2	Test3	Test1	Test2	Test3
	Average accuracy (%) ± standard deviation
ELM25	95.02 ± 0.78	97.27 ± 0.89	92.94 ± 1.30	90.25 ± 1.87	92.93 ± 1.04	90.76 ± 1.27
DE-ELM²⁵	95.86 ± 0.78	97.68 ± 1.26	94.14 ± 2.08	93.10 ± 2.16	97.05 ± 1.31	91.88 ± 1.23
EE-ELM²⁵	95.88 ± 0.42	98.27 ± 0.47	93.84 ± 0.98	92.55 ± 1.01	93.72 ± 0.63	91.35 ± 0.82
EH-ELM	98.80 ± 0.81	99.85 ± 0.68	97.66 ± 0.92	96.33 ± 1.63	99.60 ± 0.72	94.55 ± 0.86

ALIF: adaptive local iterative filtering; PE: permutation entropy; EMD: empirical mode decomposition; ELM: extreme learning machine.

Figure 7.

Comparison of results obtained by ELM and other variants.

It shows that ALIF + PE and SampEn have better representation ability. Given the same feature matrix, the rank of the four classifiers is EH-ELM, EE-ELM, DE-ELM, and ELM in every test. It explains that the generalization performance of EH-ELM is improved by the unsupervised feature learning mechanism, which is much more obvious in the large class label number classification problem. Especially, the performance is more significantly improved in Test3, indicating that EH-ELM is more suitable for multi-classification with various loads.

Our method is also compared with back-propagation neural network, multi-layer perceptron (MLP), and standard auto-encoder which use the entropy matrix extracted by ALIF as the feature vector. Table 3 shows the results of these methods with Test3 datasets. Ten trials are carried out for diagnosing each dataset in Table 1, and the results are shown in Figure 8. The average accuracy of EH-ELM and standard deep auto-encoder method is 97.66% and 97.32%, which is much higher than the other two methods (93.88% and 65.67%, respectively). Thus, EH-ELM has similar performance to standard deep auto-encoder. However, the average computation time of EH-ELM is 22.12 s, which is much shorter than the other three methods (179.74, 37.67, and 38.09 s, respectively). It can be seen from Table 3 that (1) EH-ELM shows better classification accuracy than shallow models such as MLP and back-propagation neural network, and (2) the proposed method has obvious advantages over the standard deep auto-encoder in terms of computation time.

Table 3.

Average accuracy and computation time of the four methods.

	Average accuracy (%)	Standard deviation of accuracy	Average computation time (s)
EH-ELM	97.66	±0.92	22.12
Standard deep auto-encoder	97.32	±0.35	179.74
MLP	93.88	±0.75	37.67
BPNN	65.67	±0.48	38.09

ELM: extreme learning machine; MLP: multi-layer perceptron; BPNN: back-propagation neural network.

Figure 8.

Comparison results with standard auto-encoder, BPNN, and MLP.

Finally, the intuitive classification results of EH-ELM are shown in Figure 9, where different fault types are represented by different symbols. The Y-ordinate indicates the type of fault output from the three test cases, and the X-ordinate indicates the fault sample number. It is shown that the proposed method can reliably recognize different fault categories and severity levels.

Figure 9.

Classification results using EH-ELM: (a) Test1, (b) Test2, and (3) Test3.

Discussion

The classification results of the bearing fault diagnosis experiments show that the proposed method based on EH-ELM can significantly improve identification accuracy and generalization performance, which can be attributed to the following: (a) the DL architecture of EH-ELM could find the intrinsic fault information and increase the classification precision, and (b) the ensemble learning strategy of EH-ELM guarantees the stability of the results. Compared with the original ELM, EH-ELM results in an improvement of classification accuracy by 3.78%, 1.14%, and 4.72% on Test1, Test2, and Test 3, respectively. EH-ELM shows better classification performance when there is a relatively large number of fault types. Compared with standard auto-encoder, the proposed method has better robustness and fewer parameters to be tuned. As parameter tuning is a complex problem for machine learning engineers, EH-ELM is more suitable for practical engineering design.

Conclusions

In order to expand ELM to DL diagnosis field, a novel diagnosis scheme consisting of ALIF and EH-ELM is proposed in this study. The proposed DL diagnosis algorithm is applied on CWRU benchmark datasets under different severity conditions to verify its effectiveness and accuracy. The numerical test results demonstrate that the proposed method is more powerful and robust than other ELM variants and conventional shallow models such as MLP and back-propagation neural network. The proposed method is also advantageous over the standard deep auto-encoder in terms of computation time, indicating that EH-ELM is more suitable for practical diagnosis applications.

Footnotes

Handling Editor: ZW Zhong

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article is supported by National Key R&D Program of China (2018YFC0406903), the National Science Foundation of China under Grants (No. 51609258), and the National Science Foundation of China under Grants (No. 51779268). In addition, we would be grateful to thank the CWRU center for providing bearing fault datasets.

ORCID iD

Yu Tian

References

Randall

Antoni

Rolling element bearing diagnostics—a tutorial. Mech Syst Signal Pr 2011; 25: 485–520.

Wang

Kang

Jiang

et al . Classification of fault location and the degree of performance degradation of a rolling bearing based on an improved hyper-sphere-structured multi-class support vector machine. Mech Syst Signal Pr 2012; 29: 404–414.

Tiwari

Gupta

Kankar

PK.

Bearing fault diagnosis based on multi-scale permutation entropy and adaptive neuro fuzzy classifier. J Vib Control 2013; 21: 461–467.

Xue

Zhou

A hybrid fault diagnosis approach based on mixed-domain state features for rotating machinery. ISA Trans 2017; 66: 284–295.

Heidari

Shateyi

Wavelet support vector machine and multi layer perceptron neural network with continues wavelet transform for fault diagnosis of gearboxes. J Vibroeng 2017; 19: 125–137.

Qin

Mao

Tang

Multicomponent decomposition by wavelet modulus maxima and synchronous detection. Mech Syst Signal Pr 2017; 91: 57–80.

Huang

Shen

Long

et al . The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. P Roy Soc A: Math Phy 1998; 454: 903–995.

Jiao

Jing

Huang

et al . Research on fault diagnosis of airborne fuel pump based on EMD and probabilistic neural networks. Microelectron Reliab 2017; 75: 296–308.

Lei

Lin

et al . A review on empirical mode decomposition in fault diagnosis of rotating machinery. Mech Syst Signal Pr 2013; 35: 108–126.

10.

Lin

Wang

Zhou

Iterative filtering as an alternative algorithm for empirical mode decomposition. Adv Adapt Data Anal 2009; 1: 543–560.

11.

Cicone

Liu

Zhou

Adaptive local iterative filtering for signal decomposition and instantaneous frequency analysis. Appl Comput Harmon Anal 2016; 41: 384–411.

12.

Yang

Vibration signal analysis of a hydropower unit based on adaptive local iterative filtering. Proc IMechE, Part C: J Mechanical Engineering Science 2017; 231: 1339–1353.

13.

Pan

Wind turbine bearing fault diagnosis based on adaptive local iterative filtering and approximate entropy. Proc IMechE, Part C: J Mechanical Engineering Science 2016; 231: 3228–3237.

14.

Local rub-impact fault diagnosis of a rotor system based on adaptive local iterative filtering. T I Meas Control 2015; 39: 748–753.

15.

Wang

et al . Fault detection and diagnosis of chillers using Bayesian network merged distance rejection and multi-source non-sensor information. Appl Energ 2017; 188: 200–214.

16.

Xiao

Zhou

Xiao

et al . Identification of vibration–speed curve for hydroelectric generator unit using statistical fuzzy vector chain code and support vector machine. J Risk Reliab 2014; 228: 291–300.

17.

Mao

et al . Bearing fault diagnosis with auto-encoder extreme learning machine: a comparative study. Proc IMechE, Part C: J Mechanical Engineering Science 2016; 231: 1560–1578.

18.

Kang

Kim

et al . Reliable fault diagnosis for low-speed bearings using individually trained support vector machines with Kernel discriminative feature analysis. IEEE T Power Electron 2015; 30: 2786–2797.

19.

Boldt

FDA

Rauber

Varejão

FM.

Cascade feature selection and ELM for automatic fault diagnosis of the Tennessee Eastman process. Neurocomputing 2017; 239: 238–248.

20.

Jiang

Xuan

Shi

Feature extraction based on semi-supervised kernel Marginal Fisher analysis and its application in bearing fault diagnosis. Mech Syst Signal Pr 2013; 41: 113–126.

21.

Lecun

Bengio

Hinton

Deep learning. Nature 2015; 521: 436–444.

22.

Janssens

Slavkovikj

Vervisch

et al . Convolutional neural network based fault detection for rotating machinery. J Sound Vib 2016; 377: 331–345.

23.

Wang

Qin

et al . Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification. Signal Process 2017; 130: 377–388.

24.

Shao

Jiang

Zhao

et al . A novel deep autoencoder feature learning method for rotating machinery fault diagnosis. Mech Syst Signal Pr 2017; 95: 187–204.

25.

Shao

Jiang

Zhang

et al . Rolling bearing fault feature learning using improved convolutional deep belief network with compressed sensing. Mech Syst Signal Pr 2018; 100: 743–765.

26.

Tamilselvan

Wang

Failure diagnosis using deep belief learning based health state classification. Reliab Eng Syst Safe 2013; 115: 124–135.

27.

Yang

Gan

Unsupervised fault diagnosis of a gear transmission chain using a deep belief network. Sensors 2017; 17: 1564.

28.

Shen

Wang

et al . Stacked sparse autoencoder-based deep network for fault diagnosis of rotating machinery. IEEE Access 2017; 5: 15066–15079.

29.

Koutsoukas

Monaghan

et al . Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 2017; 9: 42.

30.

Lorenzo

Nalepa

Kawulok

et al . Particle swarm optimization for hyper-parameter selection in deep neural networks. In: Proceedings of the genetic and evolutionary computation conference, Berlin, 15–19 July 2017, pp.481–488. New York: ACM.

31.

Huang

Zhou

Ding

et al . Extreme learning machine for regression and multiclass classification. IEEE T Syst Man Cy B 2012; 42: 513–529.

32.

Xiao

Zhou

et al . Multi-fault classification based on the two-stage evolutionary extreme learning machine and improved artificial bee colony algorithm. Proc IMechE, Part C: J Mechanical Engineering Science 2014; 228: 1797–1807.

33.

Tang

Deng

Huang

GB.

Extreme learning machine for multilayer perceptron. IEEE T Neur Net Lear 2016; 27: 809–821.

34.

Zhou

Xiao

et al . Multifault diagnosis for rolling element bearings based on intrinsic mode permutation entropy and ensemble optimal extreme learning machine. Adv Mech Eng. Epub ahead of print 14 February 2014. DOI: 10.1155/2014/803919.

35.

Noushahr

Ahmadi

Casey

Fast handwritten digit recognition with multilayer ensemble extreme learning machine. In: Bramer

Petridis

(eds) Research and development in intelligent systems XXXII. Cham: Springer, 2015, pp.77–89.

36.

Yan

Liu

Gao

RX.

Permutation entropy: a nonlinear statistical measure for status characterization of rotary machines. Mech Syst Signal Pr 2012; 29: 474–484.

37.

Richman

Moorman

. Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol: Heart C 2000; 278: H2039–H2049.

38.

Loparo

KA.

Bearings vibration data set. Cleveland, OH: Case Western Reserve University, 2003, http://csegroups.case.edu/bearingdatacenter/home