Abstract
To avoid catastrophic failures in rotating machines, it is of great significance to continuously monitor and diagnose the running state of rolling bearings. In this article, a deep feature extraction method for rolling bearing fault diagnosis based on empirical mode decomposition and kernel function is proposed. First, the vibration signals under different states of rolling bearing are decomposed by empirical mode decomposition. Second, to extract more representative high-level features, the obtained intrinsic mode functions are preprocessed with singular value decomposition to acquire singular value parameters, which are regarded as the inputs of the proposed stacked kernel sparse autoencoder network. The proposed method does not depend on prior knowledge of fault diagnosis and even does not need the signal denoising processing, simplifying the traditional process of feature extraction of rolling bearing fault diagnosis. To validate the superiority of the proposed diagnosis network, experiments and comparisons have been made as well. The achieved results demonstrated that the proposed empirical mode decomposition and stacked kernel sparse autoencoder–based diagnosis method has a superior performance in rolling bearing fault diagnosis.
Introduction
As the core component of rotating machines, rolling bearing is not only concerned with major economic benefits but also has a far-reaching impact on social security. Therefore, it is of great significance to monitor and diagnose the running state of rolling bearing. Due to the complexity work environment of rolling bearing, the collected engineering vibration signals often contain a lot of noise. 1 As a result, in order to establish a solid foundation for the later fault diagnosis of the device, the first step for bearing’s fault diagnosis is to reduce the noise mixed in the original signal.
Empirical mode decomposition (EMD),2,3 as a type of adaptive signal processing method, can effectively reduce the mixed noise in the signal by reconstructing parts of the obtained intrinsic mode functions (IMFs), 4 and several scholars have achieved successful applications. Liu et al. 5 proposed a method for EMD combined with translation invariant scale–adaptive threshold to reduce seismic random noise and achieved effective denoising performance based on the experiments of synthetic seismic signal. Žvokelj et al. 6 presented a kernel principal component analysis (KPCA) incorporated with ensemble empirical mode decomposition (EEMD) method for signal denoising, and then, it had been examined for the task of multiscale signal denoising as well as fault diagnosis. Mert and Akan 7 utilized the EMD and a theoretical information approach to remove the white noise on an electroencephalogram and achieved a good performance. In spite of the above methods which can obtain appropriate effect for denoising, they used a measurement technique, in which they first decomposed the original signal by EMD and then reconstructed the signal by parts of the obtained IMFs to achieve denoising purpose. However, after EMD adaptively decomposed the original signal according to its intrinsic characteristic, the obtained IMFs themselves can properly reflect the inherent characteristics of the signal, especially those high-frequency IMFs. 8 Thus, these IMFs contain more fault information and are more valuable. Pan and Tsao 9 proposed a more feasible diagnosis process for multi-fault bearings using appropriate IMFs for subsequent envelope analysis. However, the procedure for selecting an appropriate IMF (or more) to characterize bearing-fault signatures has not been explored and addressed yet. Yang et al. 10 put forward a fault diagnosis method based on EMD energy entropy and artificial neural network (ANN), which adopted each IMF’s energy as the ANN input features to identify rolling bearing’s work condition. But the dimension of each feature vector is equal to the number of IMFs, as a result, this method may miss some information of the original data. In order to ameliorate the shortcomings of the above methods, this article applies the singular value decomposition (SVD) to the IMFs, and the obtained singular values are regarded as the input features of deep neural network (DNN).
Deep learning theory was first proposed by Hinton et al. 11 in 2006, which opened the tide of its application in academia and industry. Deep learning can derive more abstract high-level features through the low-level features because of its strong nonlinear expression ability. Since it was proposed, deep learning rapidly developed in recent years and has made breakthrough in speech recognition as well as image recognition and also has obtained remarkable results in the field of rolling bearing fault diagnosis. Chen et al. 12 took 70 features (i.e. time domain, frequency domain, and time–frequency domain) extracted from the vibration signal as the input of DNN, and eventually achieved a highly reliable and applicable result in fault diagnosis of rolling bearing. Guo et al. 13 extracted 19 features from the vibration signal and converted them into a vector feature to train the proposed stacked sparse autoencoder (SSAE) and obtained a satisfactory precision. Chen and Li 14 used a sparse autoencoder (SAE) to fuse the features of multiple sensors and developed a deep belief network (DBN) to conduct a fault diagnosis of rolling bearings. Although the above-mentioned studies have improved the accuracy of rolling bearing fault diagnosis to some extent; however, they all need to select and extract features from vibration signal. In additions, these fault diagnosis methods based on multiple features extraction need to avoid irrelevant features’ influence, which not only leads to the tedious process of feature extraction but also requires a lot of prior knowledge in the field of rolling bearing fault diagnosis.
Autoencoder (AE) is a widely used model in deep learning and proficient in extracting deep features from unmarked data using multi-layer coding process. A fly in the ointment is that the encoding process of AE is a kind of nonlinear calculation, when data are in a low-dimensional space, looking for the appropriate classification is relatively difficult. To improve the network structure, a kernel function method is applied to the SAE in this article, and the improved algorithm is called kernel sparse Autoencoder (KSAE). After we take the singular values derived from the IMFs as the inputs of the stacked KSAE network, the achieved results of experiments and comparisons demonstrated that the proposed method has a superior performance in feature extraction and fault diagnosis.
In this study, we put forward a deep feature extraction method for bearing fault diagnosis based on EMD and kernel function. The proposed algorithm improved the existing deep learning models and was able to extract the features of the bearing signals with a higher accuracy compared to existing methods.
The fundamental theory
Empirical mode decomposition
EMD is such an appropriate technique for dealing with the non-stationary transient signals to achieve a series of stable data sequences with different frequency bands, and each data sequence is called an IMF. Any signal can be regarded as a combination of several IMFs, and each IMF should meet the following conditions:
In the whole time range, the number of local extremal points and zero-crossing points either would be equal or different at no more than one.
The mean value of two enveloping lines determined by local maximum and local minimum points is equal to zero at any point.
The specific decomposition steps of the EMD method are as follows: 15
Step 1: Find all the local maximum and minimum points of the signal
Step 2: The original signal
If
Step 3: If
Step 4: Calculate the residual signal
Assume
Step 5: The original signal
SVD
The SVD theorem is summarized as follows.
Let
and
The numbers
Knowledge of information theory indicates that information is contained in singular values, and the larger the singular value is, the more important information it contains. Therefore, singular values can reveal the main information hidden in high-dimensional data, which is why the singular value is selected as the feature of the signal in this article.
SAE
AE is an unsupervised three-layer neural network consisting of an encoder network and a decoded network, as shown in Figure 1. The encoder network connects the input layer and the hidden layer, which can obtain the advanced features of the original data. The hidden layer and the output layer are connected by the decoder network that reconstructs the output, which is equal to the input to obtain the best expression of the hidden layer.

The structure of AE.
A specific process is presented as follows:
Encoder network: Build a data set
where
Decoder network: the hidden layer vector set
where the parameters set of decoder are
Iterative optimization process: the approximate degree of the original and the reconstruction data is measured by mean square error (MSE), and the reconstruction error of AE can be defined as
Then, a gradient descent algorithm is utilized to optimize multiple iterations, and it can be considered that the hidden layer vector set contains most of the information in the original data until the reconstruction error is small enough. Besides, the optimal parameters set
However, in the above classic algorithm, the sparse penalty is not added, which can easily lead to the over-fitting of the network and impose a poor clustering effect. In order to improve this condition, SAE is used in this study. That is, adding a sparse penalty term to the reconstruction error of AE as a loss function. In this article, Kullback–Leibler (KL) divergence is selected as the penalty item of the network, and the loss function can be expressed as
where
SSAE
An SAE is an unsupervised three-layer learning network, and its information extraction ability is limited because it lacks sufficient structure to represent the deep characteristics of the signal. An SSAE uses multiple SAE layers to develop more hidden layers, in which each SAE layer performs a nonlinear transformation of the input samples from the preceding layer to the following one. Subsequently, a back-propagation (BP) algorithm is utilized to fine-tune the network parameters using a supervised approach. The SSAE is a type of DNN, incorporating a supervised and an unsupervised approach.
The proposed method
Feature acquisition based on EMD
After the vibration signal is decomposed by EMD method, the obtained IMFs represent the components of the signal from high to low frequencies, and fault information is mainly reflected in high-frequency IMFs. In the matrix theory, the singular value of the matrix is the intrinsic characteristic of the matrix and contains proper stability. When a matrix element slightly varies, the singular value of the matrix changes very little. 16 In the application of fault diagnosis, a matrix can be constructed by the vibration signal, and findings show that when the Hankel matrix is constructed, the matrix has definite physical significance. 17 Therefore, this study constructs the Hankel matrix by using the first several IMFs, and the singular values obtained by decomposing the Hankel matrix can express the running state of the rolling bearing.
It is assumed that each IMF is a N-length time series,
SVD of
where
In order to obtain the most number of singular values of the constructed Hankel matrix, the product of the number of rows
where
It can be seen from the constructed Hankel matrix that there is only one different point between adjacent rows. As a result, the Hankel matrix may be highly correlated, as well as being a pathological matrix, leading to the fact that only a small number of singular values are relatively large, while most of the singular values are small. If an IMF can obtain
As the fault information of rolling bear is mainly reflected in high-frequency IMFs and the number of IMFs obtained by EMD of different signals may vary, in order to unify the features’ dimensions of different samples, the former
KSAE based on a kernel function
A kernel function is defined as a nonlinear mapping
Based on the above theory, an improved method KSAE, which combines the kernel function and SAE, is proposed. First, the Gram matrix of the kernel function is calculated, and its input is the new automatic encoder; the coding process changes to
where
Correspondingly, the decoding function is changed to
The improved SAE network first maps the data to a high-dimensional space; then the high-dimensional data are coded and calculated, and the nonlinear low-dimensional features are obtained. By adding the kernel functions, the original signal components are mapped to a high-dimensional space, which speeds up the coding process and improves the efficiency of extracting the signal characteristics and the classification accuracy. The algorithm structure diagram is shown in Figure 2.

The structure of KSAE.
The proposed method can be summarized as follows: choose the KSAE as the first layer of the deep network and use the hidden layer of the KSAE as the input of the next layer of the SAE. Connect multiple SAE layers to form a deep network, which we called it stacked kernel sparse autoencoder (SKSAE) in this article.
Selection of the kernel function
Based on the Mercer theorem, any semidefinite function can be used as a kernel function. Common kernel functions include the linear kernel function, polynomial kernel function, and radial basis kernel function. The radial basis function is a real-valued function that only depends on distance. The Gaussian radial basis kernel function uses the Euclidean distance as the distance function; the transformation matrix is qualitatively good and only has one undetermined parameter; as a result, the complexity of the model is low. Therefore, the Gaussian radial basis function is used in this study; its mathematical expression is as follows
where
Fault diagnosis model based on EMD and kernel function
The input layer of the proposed KSAE network is the feature vectors acquired by EMD-based method and then multiple SAE layers are stacked. The first hidden layer of KSAE is input to the second SAE, and the second hidden layer of SAE is the input to the third SAE, and so on. Eventually, a Softmax regression model was added on top of the network for classification. The process for the entire fault diagnosis model is shown in Figure 3, involving the following steps
Step 1: The vibration signal in time domain is decomposed by EMD, and
Step 2: Since the fault information of rolling bear is mainly reflected in high-frequency IMFs and to unify the features’ dimensions of different samples, select the former
Step 3: For each IMF, select the larger former
Step 4: Divide all the samples into training and testing sets, and train SAEs as layer-by-layer in an unsupervised way. The hidden layer of each SAE is used as the input to the next SAE until the training process is completed.
Step 5: Add the classifier output layer and adjust the network parameters used by the BP algorithm.
Step 6: Put the testing set samples into the trained SKSAE network and obtain the classified labels, and the accuracy of fault diagnosis is achieved after comparison with the actual labels.

The flow chart of the entire fault diagnosis model.
Simulation and experiment
Simulation verification
In order to verify the feasibility of the proposed method, four types of simulation signals are established as follows
where
Scatter four types of these simulation signals, setting the sampling frequency to 1000 Hz and setting sample time to 1 s. The diagram of all types of discretized simulation signals is shown in Figure 4.

Discretized simulation signals.
For the above-mentioned four kinds of signals, 100 samples were considered for each kind of signal, 50 of which were used for training and 50 for testing. To determine the optimal number of IMFs, each kind of sample is first decomposed by EMD, and the results are shown in Figure 5.

EMD results of four kinds of signals.
As shown in Figure 5, the first four IMFs are relatively high-frequency components, so the first four IMFs were taken into account owning to the useful information mainly reflected in them. And for each IMF we chose, the larger former 20 singular values were selected as its meaningful features. Finally, the singular values corresponding to the first four IMFs were pieced together according to the frequency range from high to low, as 80 features of a sample. In order to verify the feature extraction ability of the proposed method, the first three principal components of the obtained singular values were extracted using KPCA method, as shown in Figure 6.

Features visualization.
It can be seen from Figure 6 that the simulation signals of the same kind can be clustered together with an obvious clustering center, and the simulation signals of different classes can be separated effectively. After extracting the singular values, the obtained features were the input of the proposed SKSAE network. The proposed EMD and SKSAE-based diagnosis network, which was utilized in this case, contains three hidden layers, and the network structure parameter is 80-36-12-4. The input layer consists of 80 neurons representing the 80-dimensional singular values of a sample, and the output layer has 4 neurons representing four different kinds of signals. The learning rate of each SAE, Softmax classifier, and the fine-tuning process is 0.3, 0.3, 0.3, 2, and 1.2, respectively. When the iterations of SKSAE, each SAE, Softmax classifier, and the fine-tuning process were set to 100, 100, 100, 100, and 300, the perfect accuracy (100%) could be achieved.
Experiment verification
The inter-shaft bearing is the key component of an aircraft engine. To verify the applicability of the proposed method in the field of bearing-fault diagnosis, a test rig for aircraft engine inter-shaft bearing based on a double rotor is used to simulate the different fault types of the bearing. Figure 7 shows the test rig and consists of an inter-shaft bearing, two motors, three mass disks, and four accelerometers.

The test rig of aero-engine inter-shaft bearing.
The inter-shaft bearing is mounted between low-voltage axis and the high-voltage axis, connecting two motors. Four acceleration sensors are installed on the support bearing pedestal of two axis to collect the vibration signal of the inter-shaft bearing. The hardware system uses an NI data-acquisition card to collect required data, and the sampling frequency is set to 25.6 kHz.
In this case, data sets contain 10 kinds of operating conditions, including the inner race fault, outer race fault, roller fault, and normal. The faults are introduced to the inter-shaft bearing under the running conditions of high-voltage-motor single rotation (HR), low-voltage-motor single rotation (LR), and high-voltage-motor/low-voltage-motor relative rotation (HLR), respectively. Besides, a normal condition of a two-motor relative rotation is introduced. The rotation speed of the motors is 1200 r/min under each operating condition. The fault grooves in the bearing are machined by an electric spark, as shown in Figure 8. The outer ring fault is displayed by taking one of roller elements because the outer race and the holder cannot be removed.

Faults in the inter-shaft bearing.
A total of 200 samples were collected under each working condition, 150 of which were used for training and 50 for testing. Table 1 shows the detailed information of the used data sets.
Description of the inter-shaft operation conditions.
HR: high-voltage-motor single rotation; LR: low-voltage-motor single rotation; HLR: high-voltage-motor/low-voltage-motor relative rotation.
The data sets we used contain 10 kinds of working conditions and to acquire appropriate information from the signals, the optimal number of IMFs may vary under different working conditions. That is to say, perhaps a signal needs four IMFs to contain all the fault information, but the other one needs only three or five. Considering that the training process of proposed SKSAE algorithm must unify the dimensions of each sample, we should choose a same number of IMFs for the signals under any working condition. Fortunately, SKSAE has a strong capacity for fault tolerance, and less important information will be given smaller weight coefficients during transmission. Many comparison experiments show that when the number of IMFs is six, the performance of the proposed SKSAE algorithm is the most outstanding.
The data processing was carried out as follows. First, the non-stationary bearing signals were decomposed by EMD, and several IMFs were obtained as well. Second, the first six IMFs were selected to construct the Hankel matrices and decomposed them by SVD to achieve the former
As the data sampling points were set to 1000, it was possible for each IMF to achieve a maximum of 500 non-negative singular values when the proposed SVD method was used. However, it is more meaningful to assume that the larger former

Relationship between accuracy and
As displayed in Figure 9, when the value of

Multi-class confusion matrix of the proposed method for
In order to highlight the superiority of the proposed method, the following comparative experiments have been carried out. Comparison experiment 1 is to extract a total of 25 features of time domain, frequency domain, and time–frequency domain, and these features are taken as the input of the proposed SKSAE network with a structure of 25-20-15-10, and eventually, the accuracy achieved is 92.5%. Comparison experiment 2 refers to the feature-acquisition method proposed in this article and puts the samples with 60 singular values into the ANN for training and testing, and eventually, the accuracy achieved is 94.6%. In comparison experiment 3, 60 singular values used in the proposed method are selected as features and fed into the unimproved SSAE network, and the achieved accuracy is 95.8%. It is obvious that all these comparison experiments are less accurate than the proposed method. In addition, the signal features should be selected, and a large number prior knowledge of fault diagnosis is required for comparison experiment 1, which leads a tedious feature extraction process. While the feature-acquisition method used in comparison experiment 2 is the same with the proposed method, the ANN’s shallow structure cannot fully extract the deep characteristics of the data, so the fault diagnosis rate is not as high as the method mentioned in this article. Comparison experiment 3 demonstrated that the proposed SKSAE algorithm has a superior performance in feature extraction and fault diagnosis than SSAE method. As a result, the EMD and kernel function–based method proposed in this article not only has improved the accuracy of fault diagnosis of rolling bearing but also requires little prior knowledge of fault diagnosis, which simplifies the feature extraction process for fault diagnosis of rolling bearing.
Conclusion
In this article, a deep feature extracted method based on EMD combined with kernel function is proposed. Because fault information is mainly reflected in high-frequency IMFs, the Hankel matrices were constructed by the first several IMFs of each sample, and singular values, obtained by decomposing the Hankel matrices, can express the running state of the rolling bearing. Then, the singular values were pieced together corresponding to the first several IMFs, which are assumed as the input vector of the network for each original signal to conduct the fault diagnosis of rolling bearing. In addition, the proposed SKSAE method improved the existing SSAE model and achieved a better performance. Results indicated that the proposed EMD and SKSAE-based diagnosis method has an excellent performance in rolling bearing fault diagnosis. Based on achieved results, the following conclusions are drawn.
After EMD adaptively decomposed the bearing signals according to its intrinsic characteristic, the obtained IMFs themselves can reflect the inherent characteristics of the signal, and the singular values obtained from the IMFs have stronger representability for the fault information of rolling bearing.
Compared with a traditional fault diagnosis method, the multi-layer network structure of the proposed method guarantees its high diagnostic accuracy. Compared with the fault diagnosis methods based on multiple features extraction, the proposed method does not require a huge prior knowledge of fault diagnosis, eliminating the tedious process of features’ selection and extraction as well.
The proposed SKSAE algorithm improved the existing deep learning models and was able to extract the features of the bearing signals with a higher accuracy compared to existing methods.
Although the proposed method can deeply extract the internal characteristics of bearing data, the difficulty lies in setting parameters of the proposed multi-layer network. Therefore, it is worth further studying how to optimize the parameters of network structure.
Footnotes
Handling Editor: Javier Cara
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is financially supported by the National Natural Science Foundation of China (grants no.: 51875075 and 51375067).
