A deep feature extraction method for bearing fault diagnosis based on empirical mode decomposition and kernel function

Abstract

To avoid catastrophic failures in rotating machines, it is of great significance to continuously monitor and diagnose the running state of rolling bearings. In this article, a deep feature extraction method for rolling bearing fault diagnosis based on empirical mode decomposition and kernel function is proposed. First, the vibration signals under different states of rolling bearing are decomposed by empirical mode decomposition. Second, to extract more representative high-level features, the obtained intrinsic mode functions are preprocessed with singular value decomposition to acquire singular value parameters, which are regarded as the inputs of the proposed stacked kernel sparse autoencoder network. The proposed method does not depend on prior knowledge of fault diagnosis and even does not need the signal denoising processing, simplifying the traditional process of feature extraction of rolling bearing fault diagnosis. To validate the superiority of the proposed diagnosis network, experiments and comparisons have been made as well. The achieved results demonstrated that the proposed empirical mode decomposition and stacked kernel sparse autoencoder–based diagnosis method has a superior performance in rolling bearing fault diagnosis.

Keywords

Rolling bearing fault diagnosis empirical mode decomposition sparse autoencoder kernel function

Introduction

As the core component of rotating machines, rolling bearing is not only concerned with major economic benefits but also has a far-reaching impact on social security. Therefore, it is of great significance to monitor and diagnose the running state of rolling bearing. Due to the complexity work environment of rolling bearing, the collected engineering vibration signals often contain a lot of noise.¹ As a result, in order to establish a solid foundation for the later fault diagnosis of the device, the first step for bearing’s fault diagnosis is to reduce the noise mixed in the original signal.

Empirical mode decomposition (EMD),^2,3 as a type of adaptive signal processing method, can effectively reduce the mixed noise in the signal by reconstructing parts of the obtained intrinsic mode functions (IMFs),⁴ and several scholars have achieved successful applications. Liu et al.⁵ proposed a method for EMD combined with translation invariant scale–adaptive threshold to reduce seismic random noise and achieved effective denoising performance based on the experiments of synthetic seismic signal. Žvokelj et al.⁶ presented a kernel principal component analysis (KPCA) incorporated with ensemble empirical mode decomposition (EEMD) method for signal denoising, and then, it had been examined for the task of multiscale signal denoising as well as fault diagnosis. Mert and Akan⁷ utilized the EMD and a theoretical information approach to remove the white noise on an electroencephalogram and achieved a good performance. In spite of the above methods which can obtain appropriate effect for denoising, they used a measurement technique, in which they first decomposed the original signal by EMD and then reconstructed the signal by parts of the obtained IMFs to achieve denoising purpose. However, after EMD adaptively decomposed the original signal according to its intrinsic characteristic, the obtained IMFs themselves can properly reflect the inherent characteristics of the signal, especially those high-frequency IMFs.⁸ Thus, these IMFs contain more fault information and are more valuable. Pan and Tsao⁹ proposed a more feasible diagnosis process for multi-fault bearings using appropriate IMFs for subsequent envelope analysis. However, the procedure for selecting an appropriate IMF (or more) to characterize bearing-fault signatures has not been explored and addressed yet. Yang et al.¹⁰ put forward a fault diagnosis method based on EMD energy entropy and artificial neural network (ANN), which adopted each IMF’s energy as the ANN input features to identify rolling bearing’s work condition. But the dimension of each feature vector is equal to the number of IMFs, as a result, this method may miss some information of the original data. In order to ameliorate the shortcomings of the above methods, this article applies the singular value decomposition (SVD) to the IMFs, and the obtained singular values are regarded as the input features of deep neural network (DNN).

Deep learning theory was first proposed by Hinton et al.¹¹ in 2006, which opened the tide of its application in academia and industry. Deep learning can derive more abstract high-level features through the low-level features because of its strong nonlinear expression ability. Since it was proposed, deep learning rapidly developed in recent years and has made breakthrough in speech recognition as well as image recognition and also has obtained remarkable results in the field of rolling bearing fault diagnosis. Chen et al.¹² took 70 features (i.e. time domain, frequency domain, and time–frequency domain) extracted from the vibration signal as the input of DNN, and eventually achieved a highly reliable and applicable result in fault diagnosis of rolling bearing. Guo et al.¹³ extracted 19 features from the vibration signal and converted them into a vector feature to train the proposed stacked sparse autoencoder (SSAE) and obtained a satisfactory precision. Chen and Li¹⁴ used a sparse autoencoder (SAE) to fuse the features of multiple sensors and developed a deep belief network (DBN) to conduct a fault diagnosis of rolling bearings. Although the above-mentioned studies have improved the accuracy of rolling bearing fault diagnosis to some extent; however, they all need to select and extract features from vibration signal. In additions, these fault diagnosis methods based on multiple features extraction need to avoid irrelevant features’ influence, which not only leads to the tedious process of feature extraction but also requires a lot of prior knowledge in the field of rolling bearing fault diagnosis.

Autoencoder (AE) is a widely used model in deep learning and proficient in extracting deep features from unmarked data using multi-layer coding process. A fly in the ointment is that the encoding process of AE is a kind of nonlinear calculation, when data are in a low-dimensional space, looking for the appropriate classification is relatively difficult. To improve the network structure, a kernel function method is applied to the SAE in this article, and the improved algorithm is called kernel sparse Autoencoder (KSAE). After we take the singular values derived from the IMFs as the inputs of the stacked KSAE network, the achieved results of experiments and comparisons demonstrated that the proposed method has a superior performance in feature extraction and fault diagnosis.

In this study, we put forward a deep feature extraction method for bearing fault diagnosis based on EMD and kernel function. The proposed algorithm improved the existing deep learning models and was able to extract the features of the bearing signals with a higher accuracy compared to existing methods.

The fundamental theory

Empirical mode decomposition

EMD is such an appropriate technique for dealing with the non-stationary transient signals to achieve a series of stable data sequences with different frequency bands, and each data sequence is called an IMF. Any signal can be regarded as a combination of several IMFs, and each IMF should meet the following conditions:

In the whole time range, the number of local extremal points and zero-crossing points either would be equal or different at no more than one.

The mean value of two enveloping lines determined by local maximum and local minimum points is equal to zero at any point.

The specific decomposition steps of the EMD method are as follows:¹⁵

Step 1: Find all the local maximum and minimum points of the signal $x (t)$ and then connect them with two cubic splines, which are the upper envelope $y_{up} (t)$ and the lower envelope $y_{low} (t)$ . The average of the upper and lower envelope is presented as $m_{1} (t)$ , where

m_{1} (t) = \frac{1}{2} [y_{up} (t) + y_{low} (t)]

(1)

Step 2: The original signal $x (t)$ minus $m_{1} (t)$ achieves a new sequence $h_{1} (t)$ , where

h_{1} (t) = x (t) - m_{1} (t)

(2)

If $h_{1} (t)$ satisfies both the two above-mentioned conditions, then $h_{1} (t)$ is the first IMF of $x (t)$ .

Step 3: If $h_{1} (t)$ does not satisfy those two conditions, assume $h_{1} (t)$ as the original signal, and repeat these steps k times until $h_{1 k} (t)$ satisfies the conditions. Then, the first IMF is obtained and the ith IMF is marked as $c_{i} (t)$ , where

c_{i} (t) = h_{ik} (t)

(3)

Step 4: Calculate the residual signal $R e s_{1} (t)$ , where

R e s_{1} (t) = x (t) - c_{1} (t)

(4)

Assume $R e s_{1} (t)$ as the original signal, repeat the above-memtioned steps until the second, third, and nth IMF are obtained.

Step 5: The original signal $x (t)$ can be expressed as sum of the residual term $R e s_{1} (t)$ plus the IMFs decomposed from $x (t)$ , where

x (t) = \sum_{i = 1}^{n} c_{i} (t) + R e s_{n} (t)

(5)

SVD

The SVD theorem is summarized as follows.

Let $A \in C^{m \times n}$ , $Rank (A) = r$ . Then there must exists orthogonal matrices $U \in C^{m \times m}$ and $V \in C^{n \times n}$ such that $A = U Σ V^{H}$ , where

Σ = [\begin{matrix} Δ & 0 \\ 0 & 0 \end{matrix}] \in C^{m \times n}

(6)

and $Δ = d i a g (σ_{1}, σ_{2}, \dots, σ_{r})$ with $σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0$ .

The numbers $σ_{1}, σ_{2}, \dots, σ_{r}$ together with $σ_{r + 1} = 0, \dots, σ_{n} = 0$ are the positive square roots of the eigenvalues of $A^{H} A$ or $A A^{H}$ and are called the singular values of $A$ .

Knowledge of information theory indicates that information is contained in singular values, and the larger the singular value is, the more important information it contains. Therefore, singular values can reveal the main information hidden in high-dimensional data, which is why the singular value is selected as the feature of the signal in this article.

SAE

AE is an unsupervised three-layer neural network consisting of an encoder network and a decoded network, as shown in Figure 1. The encoder network connects the input layer and the hidden layer, which can obtain the advanced features of the original data. The hidden layer and the output layer are connected by the decoder network that reconstructs the output, which is equal to the input to obtain the best expression of the hidden layer.

Figure 1.

The structure of AE.

A specific process is presented as follows:

Encoder network: Build a data set $X = {x_{1}, x_{2}, \dots, x_{N}}$ that contains N samples, which are used as the data of input layer for the AE and map to a hidden representation $h = {h_{1}, h_{2}, \dots, h_{N}}$ by an activation function of encoder. If each sample contains n data points and the hidden layer has m nodes, the sample can be represented as $(x_{1}, x_{2}, \dots, x_{n})$ , and the coding process can be expressed as

h_{i} = s_{f} (W_{1} x_{i} + b_{1}), i = 1, \dots, N

(7)

where $s_{f}$ is the activation function of encoder, and the parameters set of encoder are $θ_{1} = {W_{1}, b_{1}}$ , where $W_{1}$ is a weight matrix, and $b_{1}$ is a bias vector.

Decoder network: the hidden layer vector set $h = {h_{1}, h_{2}, \dots, h_{N}}$ is used as an input for the decoder network and reconstructs the data set $R = {r_{1}, r_{2}, \dots, r_{N}}$ through the activation function $s_{g}$ . The decoding process can be expressed as

r_{i} = s_{g} (W_{2} h_{i} + b_{2}), i = 1, 2, \dots, N

(8)

where the parameters set of decoder are $θ_{2} = {W_{2}, b_{2}}$ , where $W_{2}$ is a weight matrix, and $b_{2}$ is a bias vector.

Iterative optimization process: the approximate degree of the original and the reconstruction data is measured by mean square error (MSE), and the reconstruction error of AE can be defined as

\begin{matrix} J_{MSE} (θ) = \frac{1}{N} \sum_{i = 1}^{N} L_{MSE} (x_{i}, r_{i}) = \frac{1}{N} \sum_{i = 1}^{N} (\frac{1}{2} {‖ r_{i} - x_{i} ‖}^{2}) \\ i = 1, \dots, N \end{matrix}

(9)

Then, a gradient descent algorithm is utilized to optimize multiple iterations, and it can be considered that the hidden layer vector set contains most of the information in the original data until the reconstruction error is small enough. Besides, the optimal parameters set $θ = {W_{1}, b_{1}, W_{2}, b_{2}}$ is eventually obtained.

However, in the above classic algorithm, the sparse penalty is not added, which can easily lead to the over-fitting of the network and impose a poor clustering effect. In order to improve this condition, SAE is used in this study. That is, adding a sparse penalty term to the reconstruction error of AE as a loss function. In this article, Kullback–Leibler (KL) divergence is selected as the penalty item of the network, and the loss function can be expressed as

J (θ) = J_{M S E} (θ) + β \sum_{j = 1}^{m} K L (ρ ‖ ρ_{j})

(10)

where $β$ denotes the weight of the a sparse penalty item, and $ρ$ is a sparse parameter, where $ρ_{j}$ is the average activation quantity of the mth node in the hidden layer.

SSAE

An SAE is an unsupervised three-layer learning network, and its information extraction ability is limited because it lacks sufficient structure to represent the deep characteristics of the signal. An SSAE uses multiple SAE layers to develop more hidden layers, in which each SAE layer performs a nonlinear transformation of the input samples from the preceding layer to the following one. Subsequently, a back-propagation (BP) algorithm is utilized to fine-tune the network parameters using a supervised approach. The SSAE is a type of DNN, incorporating a supervised and an unsupervised approach.

The proposed method

Feature acquisition based on EMD

After the vibration signal is decomposed by EMD method, the obtained IMFs represent the components of the signal from high to low frequencies, and fault information is mainly reflected in high-frequency IMFs. In the matrix theory, the singular value of the matrix is the intrinsic characteristic of the matrix and contains proper stability. When a matrix element slightly varies, the singular value of the matrix changes very little.¹⁶ In the application of fault diagnosis, a matrix can be constructed by the vibration signal, and findings show that when the Hankel matrix is constructed, the matrix has definite physical significance.¹⁷ Therefore, this study constructs the Hankel matrix by using the first several IMFs, and the singular values obtained by decomposing the Hankel matrix can express the running state of the rolling bearing.

It is assumed that each IMF is a N-length time series, $IM F_{i} = {x_{1}, x_{2}, \dots, x_{N}}$ , and a $(p \times q)$ -dimensional matrix $H_{p * q}$ can be constructed

H_{p * q} = (\begin{matrix} x_{1} & x_{2} & \dots & x_{q} \\ x_{2} & x_{3} & \dots & x_{q + 1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{p} & x_{p + 1} & \dots & x_{N} \end{matrix})

(11)

SVD of $H_{p^{*} q}$ can be expressed as

H_{p^{*} q} = U_{p^{*} p} Σ_{p^{*} q} V_{q * q}^{T}

(12)

where $U$ and $V$ are orthogonal matrix. $Σ$ is non-negative diagonal matrix, that is

\begin{matrix} Σ = (\begin{matrix} Δ & 0 \\ 0 & 0 \end{matrix}), Δ = diag (σ_{1}, σ_{2}, \dots, σ_{r}), σ_{i} \\ = Σ (i, i), σ_{i - 1} \geq σ_{i}, i = 1, 2, \dots, r \end{matrix}

(13)

In order to obtain the most number of singular values of the constructed Hankel matrix, the product of the number of rows p and the number of columns q of the Hankel matrix should be as large as possible, while the values of p and q depend on the parity of N

p = {\begin{matrix} \frac{N}{2}, N is an even number \\ \frac{(N + 1)}{2}, N is an odd number \end{matrix}

(14)

where N represents the length of IMF, $q = N + 1 - p$ .

It can be seen from the constructed Hankel matrix that there is only one different point between adjacent rows. As a result, the Hankel matrix may be highly correlated, as well as being a pathological matrix, leading to the fact that only a small number of singular values are relatively large, while most of the singular values are small. If an IMF can obtain r (r > 0) singular values, it is more meaningful to assume the larger former s (r > s > 0) singular values as the features of this IMF.

As the fault information of rolling bear is mainly reflected in high-frequency IMFs and the number of IMFs obtained by EMD of different signals may vary, in order to unify the features’ dimensions of different samples, the former m IMFs were assumed and decomposed by SVD. For the singular values of each IMF, the larger former s singular values are selected as its features. Consequently, a $(m \times s)$ -dimensional vector can be converted as the input vector of the network for each original signal.

KSAE based on a kernel function

A kernel function is defined as a nonlinear mapping $ϕ$ from the input space R to the characteristic space H; for all $x, y \in R$ , function $K (x, y)$ : $K (x, y) = K (ϕ (x), ϕ (y))$ . Because the calculation of the hidden layer in the feature space is provided by the kernel function, a specific mapping relationship dose not have to be defined, which greatly reduces the computational complexity of the problem.

Based on the above theory, an improved method KSAE, which combines the kernel function and SAE, is proposed. First, the Gram matrix of the kernel function is calculated, and its input is the new automatic encoder; the coding process changes to

h = s_{f} [W^{(1)} K (x_{i}, x_{j}) + b^{(1)}]

(15)

where $x_{i}, x_{j}$ are any two samples of the training data.

Correspondingly, the decoding function is changed to

K (z_{i}, z_{j}) = s_{g} (W^{(2)} h + b^{(2)})

(16)

The improved SAE network first maps the data to a high-dimensional space; then the high-dimensional data are coded and calculated, and the nonlinear low-dimensional features are obtained. By adding the kernel functions, the original signal components are mapped to a high-dimensional space, which speeds up the coding process and improves the efficiency of extracting the signal characteristics and the classification accuracy. The algorithm structure diagram is shown in Figure 2.

Figure 2.

The structure of KSAE.

The proposed method can be summarized as follows: choose the KSAE as the first layer of the deep network and use the hidden layer of the KSAE as the input of the next layer of the SAE. Connect multiple SAE layers to form a deep network, which we called it stacked kernel sparse autoencoder (SKSAE) in this article.

Selection of the kernel function

Based on the Mercer theorem, any semidefinite function can be used as a kernel function. Common kernel functions include the linear kernel function, polynomial kernel function, and radial basis kernel function. The radial basis function is a real-valued function that only depends on distance. The Gaussian radial basis kernel function uses the Euclidean distance as the distance function; the transformation matrix is qualitatively good and only has one undetermined parameter; as a result, the complexity of the model is low. Therefore, the Gaussian radial basis function is used in this study; its mathematical expression is as follows

K (x, y) = \exp [\frac{- {‖ x - y ‖}^{2}}{(2 σ^{2})}]

(17)

where $σ$ is independent variable, it indicates the width of the nucleus.

Fault diagnosis model based on EMD and kernel function

The input layer of the proposed KSAE network is the feature vectors acquired by EMD-based method and then multiple SAE layers are stacked. The first hidden layer of KSAE is input to the second SAE, and the second hidden layer of SAE is the input to the third SAE, and so on. Eventually, a Softmax regression model was added on top of the network for classification. The process for the entire fault diagnosis model is shown in Figure 3, involving the following steps

Step 1: The vibration signal in time domain is decomposed by EMD, and n IMFs are obtained.

Step 2: Since the fault information of rolling bear is mainly reflected in high-frequency IMFs and to unify the features’ dimensions of different samples, select the former m IMFs and decompose them by SVD.

Step 3: For each IMF, select the larger former s singular values as its meaningful features, so that each sample can obtain $(m \times s)$ features, which is the input of the proposed SKSAE network.

Step 4: Divide all the samples into training and testing sets, and train SAEs as layer-by-layer in an unsupervised way. The hidden layer of each SAE is used as the input to the next SAE until the training process is completed.

Step 5: Add the classifier output layer and adjust the network parameters used by the BP algorithm.

Step 6: Put the testing set samples into the trained SKSAE network and obtain the classified labels, and the accuracy of fault diagnosis is achieved after comparison with the actual labels.

Figure 3.

The flow chart of the entire fault diagnosis model.

Simulation and experiment

Simulation verification

In order to verify the feasibility of the proposed method, four types of simulation signals are established as follows

{\begin{matrix} x_{1} (t) = \sin (20 π t) + 5 \sin (100 π t) + 9 \sin (180 π t) + X_{1} \\ x_{2} (t) = 3 \sin (40 π t) + 4 \sin (120 π t) + 8 \sin (140 π t) + X_{2} \\ x_{3} (t) = 2 \sin (60 π t) + 6 \sin (80 π t) + 7 \sin (160 π t) + X_{3} \\ x_{4} (t) = 3 \sin (30 π t) + 5 \sin (70 π t) + 7 \sin (200 π t) + X_{4} \end{matrix}

where $X_{1}, X_{2}, X_{3}, X_{4}$ obey the uniform distribution of [–4, 4], which are used to simulate the noise produced by vibration.

Scatter four types of these simulation signals, setting the sampling frequency to 1000 Hz and setting sample time to 1 s. The diagram of all types of discretized simulation signals is shown in Figure 4.

Figure 4.

Discretized simulation signals.

For the above-mentioned four kinds of signals, 100 samples were considered for each kind of signal, 50 of which were used for training and 50 for testing. To determine the optimal number of IMFs, each kind of sample is first decomposed by EMD, and the results are shown in Figure 5.

Figure 5.

EMD results of four kinds of signals.

As shown in Figure 5, the first four IMFs are relatively high-frequency components, so the first four IMFs were taken into account owning to the useful information mainly reflected in them. And for each IMF we chose, the larger former 20 singular values were selected as its meaningful features. Finally, the singular values corresponding to the first four IMFs were pieced together according to the frequency range from high to low, as 80 features of a sample. In order to verify the feature extraction ability of the proposed method, the first three principal components of the obtained singular values were extracted using KPCA method, as shown in Figure 6.

Figure 6.

Features visualization.

It can be seen from Figure 6 that the simulation signals of the same kind can be clustered together with an obvious clustering center, and the simulation signals of different classes can be separated effectively. After extracting the singular values, the obtained features were the input of the proposed SKSAE network. The proposed EMD and SKSAE-based diagnosis network, which was utilized in this case, contains three hidden layers, and the network structure parameter is 80-36-12-4. The input layer consists of 80 neurons representing the 80-dimensional singular values of a sample, and the output layer has 4 neurons representing four different kinds of signals. The learning rate of each SAE, Softmax classifier, and the fine-tuning process is 0.3, 0.3, 0.3, 2, and 1.2, respectively. When the iterations of SKSAE, each SAE, Softmax classifier, and the fine-tuning process were set to 100, 100, 100, 100, and 300, the perfect accuracy (100%) could be achieved.

Experiment verification

The inter-shaft bearing is the key component of an aircraft engine. To verify the applicability of the proposed method in the field of bearing-fault diagnosis, a test rig for aircraft engine inter-shaft bearing based on a double rotor is used to simulate the different fault types of the bearing. Figure 7 shows the test rig and consists of an inter-shaft bearing, two motors, three mass disks, and four accelerometers.

Figure 7.

The test rig of aero-engine inter-shaft bearing.

The inter-shaft bearing is mounted between low-voltage axis and the high-voltage axis, connecting two motors. Four acceleration sensors are installed on the support bearing pedestal of two axis to collect the vibration signal of the inter-shaft bearing. The hardware system uses an NI data-acquisition card to collect required data, and the sampling frequency is set to 25.6 kHz.

In this case, data sets contain 10 kinds of operating conditions, including the inner race fault, outer race fault, roller fault, and normal. The faults are introduced to the inter-shaft bearing under the running conditions of high-voltage-motor single rotation (HR), low-voltage-motor single rotation (LR), and high-voltage-motor/low-voltage-motor relative rotation (HLR), respectively. Besides, a normal condition of a two-motor relative rotation is introduced. The rotation speed of the motors is 1200 r/min under each operating condition. The fault grooves in the bearing are machined by an electric spark, as shown in Figure 8. The outer ring fault is displayed by taking one of roller elements because the outer race and the holder cannot be removed.

Figure 8.

Faults in the inter-shaft bearing.

A total of 200 samples were collected under each working condition, 150 of which were used for training and 50 for testing. Table 1 shows the detailed information of the used data sets.

Table 1.

Description of the inter-shaft operation conditions.

	Bearing operating condition	Motor speed (Hz), HR/LR	Size of training/testing samples	Label
ILR	Inner fault (LR)	0/20	150/50	1
IHR	Inner fault (HR)	20/0	150/50	2
IHLR	Inner fault (HLR)	20/20	150/50	3
OLR	Outer fault (LR)	0/20	150/50	4
OHR	Outer fault (HR)	20/0	150/50	5
OHLR	Outer fault (HLR)	20/20	150/50	6
RLR	Roller fault (LR)	0/20	150/50	7
RHR	Roller fault (HR)	20/0	150/50	8
RHLR	Roller fault (HLR)	20/20	150/50	9
NHLR	Normal (HLR)	20/20	150/50	10

HR: high-voltage-motor single rotation; LR: low-voltage-motor single rotation; HLR: high-voltage-motor/low-voltage-motor relative rotation.

The data sets we used contain 10 kinds of working conditions and to acquire appropriate information from the signals, the optimal number of IMFs may vary under different working conditions. That is to say, perhaps a signal needs four IMFs to contain all the fault information, but the other one needs only three or five. Considering that the training process of proposed SKSAE algorithm must unify the dimensions of each sample, we should choose a same number of IMFs for the signals under any working condition. Fortunately, SKSAE has a strong capacity for fault tolerance, and less important information will be given smaller weight coefficients during transmission. Many comparison experiments show that when the number of IMFs is six, the performance of the proposed SKSAE algorithm is the most outstanding.

The data processing was carried out as follows. First, the non-stationary bearing signals were decomposed by EMD, and several IMFs were obtained as well. Second, the first six IMFs were selected to construct the Hankel matrices and decomposed them by SVD to achieve the former s singular values as its meaningful features. Eventually, the singular values were converted as the input vector of the proposed SKSAE network for each original signal in order to conduct the fault diagnosis of rolling bearing.

As the data sampling points were set to 1000, it was possible for each IMF to achieve a maximum of 500 non-negative singular values when the proposed SVD method was used. However, it is more meaningful to assume that the larger former s singular values as the features owing to the construct Hankel matrix may be a pathological matrix. In this study, a relationship is proposed between the accuracy of the fault diagnosis and number of the selected singular values which is between 5 and 50 with a step of 5. The mentioned relationship is illustrated in Figure 9.

Figure 9.

Relationship between accuracy and s.

As displayed in Figure 9, when the value of s is between 5 and 50, the rates of accuracy using the proposed method range from 95.3% to 97.8%. When the value of s is 10, it means that 60 singular values were extracted from each sample, the diagnostic accuracy reached the maximum value, 97.8%. When the value of s is 10, three hidden layers have been selected in the proposed network. The network structure parameter is 60-32-21-10. The input layer consists of 60 neurons representing the 60-dimensional singular values of a sample, and the output layer has 10 neurons representing 10 different operating conditions of inter-shaft bearing. The learning rate of KSAE, each SAE, Softmax classifier, and the fine-tuning process is 0.3, 0.3, 0.3, 2, and 1.2, respectively. When the iterations of KSAE, each SAE, Softmax classifier, and the fine-tuning process all set to 100, only 22 samples were incorrectly classified, and corresponding accuracy could be achieved 97.8%. The multi-class confusion matrix shows the specific results for classification of the proposed method, which is shown in Figure 10.

Figure 10.

Multi-class confusion matrix of the proposed method for s equals to 10.

In order to highlight the superiority of the proposed method, the following comparative experiments have been carried out. Comparison experiment 1 is to extract a total of 25 features of time domain, frequency domain, and time–frequency domain, and these features are taken as the input of the proposed SKSAE network with a structure of 25-20-15-10, and eventually, the accuracy achieved is 92.5%. Comparison experiment 2 refers to the feature-acquisition method proposed in this article and puts the samples with 60 singular values into the ANN for training and testing, and eventually, the accuracy achieved is 94.6%. In comparison experiment 3, 60 singular values used in the proposed method are selected as features and fed into the unimproved SSAE network, and the achieved accuracy is 95.8%. It is obvious that all these comparison experiments are less accurate than the proposed method. In addition, the signal features should be selected, and a large number prior knowledge of fault diagnosis is required for comparison experiment 1, which leads a tedious feature extraction process. While the feature-acquisition method used in comparison experiment 2 is the same with the proposed method, the ANN’s shallow structure cannot fully extract the deep characteristics of the data, so the fault diagnosis rate is not as high as the method mentioned in this article. Comparison experiment 3 demonstrated that the proposed SKSAE algorithm has a superior performance in feature extraction and fault diagnosis than SSAE method. As a result, the EMD and kernel function–based method proposed in this article not only has improved the accuracy of fault diagnosis of rolling bearing but also requires little prior knowledge of fault diagnosis, which simplifies the feature extraction process for fault diagnosis of rolling bearing.

Conclusion

In this article, a deep feature extracted method based on EMD combined with kernel function is proposed. Because fault information is mainly reflected in high-frequency IMFs, the Hankel matrices were constructed by the first several IMFs of each sample, and singular values, obtained by decomposing the Hankel matrices, can express the running state of the rolling bearing. Then, the singular values were pieced together corresponding to the first several IMFs, which are assumed as the input vector of the network for each original signal to conduct the fault diagnosis of rolling bearing. In addition, the proposed SKSAE method improved the existing SSAE model and achieved a better performance. Results indicated that the proposed EMD and SKSAE-based diagnosis method has an excellent performance in rolling bearing fault diagnosis. Based on achieved results, the following conclusions are drawn.

After EMD adaptively decomposed the bearing signals according to its intrinsic characteristic, the obtained IMFs themselves can reflect the inherent characteristics of the signal, and the singular values obtained from the IMFs have stronger representability for the fault information of rolling bearing.

Compared with a traditional fault diagnosis method, the multi-layer network structure of the proposed method guarantees its high diagnostic accuracy. Compared with the fault diagnosis methods based on multiple features extraction, the proposed method does not require a huge prior knowledge of fault diagnosis, eliminating the tedious process of features’ selection and extraction as well.

The proposed SKSAE algorithm improved the existing deep learning models and was able to extract the features of the bearing signals with a higher accuracy compared to existing methods.

Although the proposed method can deeply extract the internal characteristics of bearing data, the difficulty lies in setting parameters of the proposed multi-layer network. Therefore, it is worth further studying how to optimize the parameters of network structure.

Footnotes

Handling Editor: Javier Cara

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is financially supported by the National Natural Science Foundation of China (grants no.: 51875075 and 51375067).

ORCID iD

Fengtao Wang

References

Zhang

Denoising algorithm based on local Laplace model in wavelet domain and its application in mechanical fault diagnosis. J Mech Eng 2009; 45: 52–57.

Huang

Shen

Long

et al . The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc Math Phys Eng Sci 1998; 454: 903–995.

Huang

Shen

Long

SR.

A new view of nonlinear water waves: the Hilbert spectrum. Ann Rev Fluid Mech 1999; 31: 417–457.

Lei

Machinery fault diagnosis based on improved Hilbert-Huang transform. J Mech Eng 2011; 47: 71.

Liu

HT.

Seismic random noise reduction by empirical mode decomposition combined with translation invariant scale-adaptive threshold. In: Proceedings of the international conference on wavelet analysis and pattern recognition, Xi’an China, 15–17 July 2012, vol. 12, pp.53–57. New York: IEEE.

Žvokelj

Zupan

Prebil

Non-linear multivariate and multiscale monitoring and signal denoising strategy using kernel principal component analysis combined with ensemble empirical mode decomposition method. Mech Syst Signal Pr 2011; 25: 2631–2653.

Mert

Akan

EEG denoising based on empirical mode decomposition and mutual information. Medicon 2014; 41: 631–634.

Wei

Lin

Liu

et al . Improved EEMD denoising method based on singular value decomposition for the chaotic signal. Shock Vib 2016; 12: 1–14.

Pan

Tsao

WC.

Using appropriate IMFs for envelope analysis in multiple fault diagnosis of ball bearings. Int J Mech Sci 2013; 69: 114–124.

10.

Yang

Cheng

A roller bearing fault diagnosis method based on EMD energy entropy and ANN. J Sound Vib 2006; 294: 269–277.

11.

Lecun

Bengio

Hinton

Deep learning. Nature 2015; 521: 436.

12.

Chen

Deng

Chen

et al . Deep neural networks-based rolling bearing fault diagnosis. Microelectron Reliab 2017; 75: 327–333.

13.

Guo

Gao

Zhang

et al . Research on bearing condition monitoring based on deep learning. Chin J Vib Shock 2016; 35: 167–171.

14.

Chen

Multisensor feature fusion for bearing fault diagnosis using sparse autoencoder and deep belief network. IEEE T Instrum Meas 2017; 66: 1693–1702.

15.

Shao

Multi-fault feature extraction and diagnosis of gear transmission system using time-frequency analysis and wavelet threshold de-noising based on EMD. Shock Vib 2014; 20: 763–780.

16.

Abdel-Aziz

MR.

Bounds for the relative change of singular values of a real matrix. Int Math Forum 2007; 2: 53–56.

17.

Zhao

Similarity of signal processing effect between Hankel matrix-based SVD and wavelet transform and its mechanism analysis. Mech Syst Signal Pr 2009; 23: 1062–1075.