Abstract
The effective fault diagnosis of the motor bearings not only can ensure the smooth and efficient operation of equipment but also can detect and eliminate the running fault in time to prevent major accidents. Based on deep learning algorithm, this article constructs a stacked auto-encoder network. The input data are compressed and reduced by introducing sparsity constraint, so that the network can accurately extract the fault characteristics of the input data, and the fault recognition ability of the network can be improved by introducing random noise. The simulation result shows that the stacked auto-encoder network can not only overcome the shortcomings of traditional fault diagnosis method that requires to distinguish fault samples manually and needs a large number of prior knowledge but also realize the self-learning of fault signal feature. The accuracy rate of fault identification reaches 98%, 94%, 96%, and 95.5% in four different working conditions. What’s more, the network can exhibit strong robustness under different working conditions. Finally, the new research ideas of fault diagnosis in thermal power plant are put forward by copying the idea of fault diagnosis of motor bearing.
Introduction
Motor bearings are an important component of electric motors and are widely used in industrial fields such as electric power production. If the equipment fails, it will affect system operation and may even cause serious economic losses and casualties. Therefore, the effective fault diagnosis of the motor bearing can not only ensure the efficient operation of the systems but also detect and eliminate operational faults in time, effectively preventing major accidents. 1
As vibration signals are highly accurate indicator of the health conditions of mechanical equipment, they are widely used in fault diagnosis. 2 The traditional method using to detect motor bearing fault is generally divided into three steps. First, the sensor is used to collect the vibration signal of the motor bearing, then the time domain and frequency domain analysis methods are used to analyze the collected signal, and finally the result is shown whether the motor bearing is faulty.3,4 The analysis methods include multinomial logistic regression (MLR), support vector machine (SVM), wavelet packet transform (WPT), and stability analysis. 5 Wang et al.6–10 used sliding-mode control to identify fault diagnosis based on nonlinear Markovian jump singular systems. However, the model represented by nonlinear Markovian jump singular systems is random, which may result in the model not being able to match the real fault characteristics. Thus some researchers have made some contributions to solve this problem. Sun and Yang 11 proposed a fault diagnosis system using link-type neural network with least squares method and SVM to improve the recognition rate of motor bearing faults. The method requires a relatively long calculation time; thus, it is not efficient. Liang et al.12,13 used a geometric approach to solve the problem of fault detection and isolation, and verified the usefulness of the proposed technique. But this method is only suitable for discrete systems, so it cannot be widely promoted. Liu et al. 14 combined the wavelet transform with the empirical mode decomposition and extracted the signal features based on the envelope demodulation signal. However, wavelet transform is not suitable for data with ambient noise. To solve the problem, Pan et al. 15 proposed a fault feature extraction method based on complex wavelet multi-scale decomposition. However, due to higher rotary machinery system complexity and sensory data heterogeneity, the effective diagnosis of multiple health state classifications based on sensory data with strong ambient noise and working condition fluctuations is still a problem and a major challenge for the application of the proposed methodologies in complex engineering systems due to possible information loss and external influences. When the machinery system becomes complicated, it is very difficult to diagnose the health state of the devices based on the vibration signal, because it is unrealistic to extract fault features and sort the type of faults from the complicated sensory data with strong ambient noise. 16
In recent years, deep learning has developed rapidly and gradually becomes a hot research topic. 17 Compared to shallow machine learning, deep learning attempts to build more complex nonlinear functions by simulating the process of learning knowledge in the human brain. One of the great advantages of deep learning is the use of unsupervised training methods. It can learn and extract the characteristics of the data, which greatly reduces the difficulty of identifying fault data. 18 In addition, deep learning is believed to be able to discover useful high-order feature representations, as well as the relevance of initial signals, which motivates the emergence of promising applications for dealing with diagnosis problems faced during classification tasks with complex and mixed system health states, both effectively and accurately.19,20 To overcome the problem which it is not easy to extract fault features and sort the type of faults from the complicated sensory data with strong ambient noise, we try to apply it to the fault diagnosis of motor bearings. However, although there exists great potential, as well as a crucial need to address these challenges by utilizing the advantages of deep learning techniques, these are still rarely applied in current fault diagnosis research of electromechanical systems.
This article proposes a stacked auto-encoder (SAE) deep neural network and uses it for fault diagnosis. SAE can learn and extract features from the vibration signal of the bearings. SAE adopts an unsupervised training method, which can independently learn data features and effectively avoid the problem of manually classifying fault data. 21 This article uses SAE to diagnose motor bearing faults in three steps: (1) constructing SAE deep neural network, adding sparse constraints to improve the compression capability, and introducing random noise to the input information to reconstruct the original data; (2) inputting the original vibration signal, training the SAE network layer by layer, and extracting feature information by self-learning; and (3) testing the accuracy of fault identification by the test data, and comparing the fault recognition rate with the other algorithms, which includes auto-encoder (AE), deep belief network (DBN), and SVM. We also compare with the existing reference results to show the advantage of the proposed method at the end of the article.
Fault diagnosis based on SAE
The description of AE
SAE is a deep neural network stacked by AE. The AE adjusts the network weight through training and learning, and finally makes the output of the network equal to the input of the network. Its network structure is shown in Figure 1, where

The network structure of auto-encoder.
The transfer process of raw data from the input layer to the hidden layer is called encoding, and the transfer process from the hidden layer to the output layer is called decoding, which can be described by equations (1) and (2)
where
AE tries to learn a function
The training of AE
A training set
For a training set of m samples, the overall cost function is
where
The weight W and bias b are updated with gradient descent as follows
where
where
where
where
where
Repeating equations (5) to (13) can make the output of AE equal to the input of AE by minimizing the overall cost function (equation (4)).
Improvement of AE
To learn feature information from complex input signals, the original data must be effectively compressed. Usually, by controlling the number of neurons in the hidden layer, the number of neurons in the hidden layer is less than the number of nodes in the input layer. At this time, the high-dimensional raw data are compressed in the hidden layer. This will alleviate the difficulty which AE learns the original data features.
Restricting the number of neurons can reduce the dimension of the data, but the network can learn few features in the hidden layer. On the basis of guaranteeing the diversity of the features in the hidden layer, a method called sparse constraint is introduced in this study to improve AE. The main idea is not to reduce the number of neurons but to consider restrictions to limit the activities of the neurons. It will reduce the dimension of the input data. Accordingly, the original overall cost function (equation (5)) should be modified to introduce an additional penalty factor, given by
where
where
The penalty term has the following property: if
In addition, in order to improve the ability of AE to identify multiple fault information, this article considers introducing random noise into the original data, so that the network can learn a more essential representation of the original data. The main idea is to set a small number of nodes in input layer to zero at a small probability. However, the probability of introducing random noise should be appropriate, otherwise the noise may cause irreversible damage to the input data.
By stacking the AEs layer by layer, we get an SAE, and the output of each layer in the network can be represented as the characteristics of the original data in different dimensions. We added the softmax classifier to the last layer of SAE, then aggregated the learned features. Finally, fault diagnosis of the motor states can be achieved. The SAE structure in this article is shown in Figure 2.

The network structure of SAE.
Motor bearing’s fault experiment
The experimental data in this article are from the US Case Western Bearing Data Center. 22 The motor bearing vibration signal is measured by the experiment system shown in Figure 3. The system is consisted of a 2 hp motor (left), a dynamometer (right), and associated control circuitry. The bearing model is SKF. The experimenters use electric spark machine to troubleshoot the inner ring, the outer ring, and the ball of the bearing. Then the experimenters place a 16-channel accelerometer on the casing for data acquisition. The sampling frequency is 12 and 48 kHz.

The bearing test system.
Selecting the vibration signal of the bearing’s drive end (DE) for SAE’s training. We use the data obtained at the frequency of 12 kHz to train the SAE, as shown in Table 1.
The comparison of test results under a single training sample.
SAE: stacked auto-encoder network; AE: auto-encoder; DBN: deep belief network; SVM: support vector machine.
Using the collected data to plot the time domain and frequency domain curves of the bearing vibration signal, as shown in Figures 4 and 5.

The bearing vibration signal in the time domain: (a) normal operation, (b) inner ring fault, (c) outer ring fault, and (d) ball fault.

The bearing vibration signal in the frequency domain: (a) normal operation, (b) inner ring fault, (c) outer ring fault, and (d) ball fault.
It can be seen from Figure 4 that although the time domain waveform can indicate the relevant pulse of the fault in advance, however, in some cases (especially when a ball fault occurs), it has a particularly large noise interference, which is not conducive to fault diagnosis. It can also be seen from Figure 5 that the spectral characteristics of various types of faults cannot be accurately distinguished because of noise interference. In summary, for complex bearing vibration signals, a more effective method is needed to improve the accuracy of fault diagnosis.
Simulation and analysis
Parameter determination
The optimum SAE model has important effects on the accuracy rate of fault diagnosis. The unsupervised learning effect of SAE is affected by parameters of model, such as the number of nodes in the input and hidden layers, sparse parameter

Reconstruction error curves for different SAE model parameters.
In fact, the more nodes in the input layer, the more information the model can learn. However, considering the complexity of the calculation, the number of input layer nodes cannot be increased arbitrarily, and an appropriate value should be selected. As can be seen from Figure 6(a), when the number of nodes of the input layer is increased from 100 to 500, the reconstruction error of SAE is gradually reduced to the lowest value. If you continue to increase the number of nodes, the reconstruction error does not change much. This result shows that SAE has the best learning effect when the number of input layer nodes is 500.
The number of nodes in the hidden layer determines the degree which the model compresses the input data. The degree of compression is high when the number of nodes in the hidden layer is small. Considering the complexity of the calculation, based on experience, we choose to build SAE with three layers of AE. The number of hidden layer nodes in the last two layers is set to 50 and 25, respectively. We adjust the number of nodes in the first hidden layer and then observe the change of the reconstruction error. It can be seen from Figure 6(b) that when the hidden layer nodes are smaller than 20% of the input layer nodes, the reconstruction error is within a reasonable range. This result shows that when the number of hidden layer nodes is small, the original data can be better compressed, which is beneficial to reduce the complexity of SAE feature learning.
Sparse constraint is introduced to improve the capability of the SAE model. It can be seen from Figure 8(c) that when the value of
As can be seen from Figure 6(d), when the probability of introducing noise is in the range of 0 to .1, the reconstruction error decreases with the increase in noise probability. However, with the continuous increase of noise probability, the reconstruction error becomes larger. This result indicates that if the introducing noise probability is reasonable, it is beneficial to enhance the diversity of SAE learning. However, if the noise is too large, it will destroy the characteristics of the original data and reduce the learning effect. Therefore, it is appropriate to set the introducing noise probability to be .1. Based on the above experimental analysis, the key parameters of the SAE are shown in Table 2.
The SAE network parameters.
SAE: stacked auto-encoder network.
Simulation analysis of fault diagnosis based on SAE
The process of data propagation in SAE can be seen as the process of reconstructing data features. When data passes through the hidden layer, they are compressed. This is equivalent to reducing the dimensions of the data and reducing the complexity of the SAE. We use the data under the working condition 1 to do experiment. We use SAE and principal components analysis (PCA) to learn the original data separately. Figure 7 is the result of their learning.

Extraction of scatterplots under different methods: (a) feature of learning based on PCA and (b) feature of learning based on SAE.
It can be seen from Figure 7 that through feature learning, SAE can easily distinguish various types of fault data, but the characteristics of fault data learned by PCA are confused and cannot be distinguished. The experimental results show that the deep learning’s ability of extracting effective information from complex raw data is superior to traditional feature extraction methods.
In order to further verify the effectiveness of SAE in motor fault diagnosis, we send the data characteristics obtained by SAE to the softmax classifier for training and finally obtain the fault type output of the motor bearing. We carried out simulation analysis on four different working conditions in Table 3. Each type of working condition selects 1000 sets of data as the training set and 200 sets of data as the test set. The fault diagnosis results of SAE are shown in Figures 8 and 9, where 1 indicates normal operation, 2 indicates inner ring fault, 3 indicates outer ring fault, and 4 indicates ball fault.
DE bearing working conditions.

SAE fault diagnosis results for case 1 and case 2.

SAE fault diagnosis results for case 3 and case 4.
It can be seen from Figure 8 that for the 200 test samples under working condition 1, only three sets of ball fault samples are misdiagnosed as inner ring fault, and the accuracy of fault diagnosis is as high as 98.5%. The accuracy of fault diagnosis under the other three working conditions also reached 94%, 96%, and 95.5%, respectively. The experimental results show that the constructed SAE has a high recognition ability for the fault diagnosis of the motor bearing. We also compare the proposed method with the exiting research. The methods used in these studies include AE, DBN, and SVM. We also use the above methods for fault identification under these four working conditions. The experimental results are shown in Figure 10. As can be seen from the figure, SVM has the worst fault recognition capability when faced with complex systems. Because AE has only a single-layer structure, the ability to extract valid information from raw data is insufficient, and the recognition ability is not high. Although DBN has a deep neural network structure, the fault recognition capability still needs to be improved because of the lack of a method to solve the random noise in the data. SAE overcomes the problems of the above methods, so the fault diagnosis effect is significantly better than the other method.

The comparison of fault diagnosis results of various methods under different operating conditions.
In the actual industrial process, the load on the motor may change frequently. In order to verify the robustness of the SAE for motor bearing fault diagnosis, we only use 1000 sets of data under the working condition 1 as the training samples and use the data under other working conditions as the test samples. The simulation results are shown in Table 3.
The experimental results in Table 3 show that the working fluctuation leads to a decrease in the accuracy of fault identification. Under the working condition 4, the recognition accuracy rate of SVM decreases the most, which decreases by 6.2%, but the SAE’s fault recognition rate falls within an acceptable range. The experimental results show that the recognition ability of SAE has strong robustness.
Application prospect of SAE in thermal power plant
Thermal power generation plays an important role in economic development. Therefore, it is also very important to monitor the fault of mechanical equipment in thermal power plants. The research object of this article is motor, which can generate fault data through human processing; however, the thermal power plant is a complex and holistic system. So it is impossible to build an actual physical model to generate fault data.
Zeng et al. 25 established an accurate coal mill model based on mass balance and energy balance theorem. Drawing on the idea of making fault to obtain fault data, we can solve the problem, which the actual coal mill model cannot be established, by model simulation. Then, we can use the SAE proposed in this article to realize the fault diagnosis of coal mill.
Further prospects, for the diagnosis of other important mechanical equipment in the industry, we can also apply the method of this paper. First, we establish an accurate mathematical model through mechanism analysis, then generate fault data through simulation, and finally use SAE to realize fault diagnosis of equipment.
Conclusion
In fact, SAE is a deep neural network composed of SAEs. It adopts the unsupervised learning method for training, and learns the characteristics of the input data autonomously by reconstructing the original information, which overcomes the problem that the traditional fault diagnosis method needs to manually classify the fault data. However, fault diagnosis based on SAE also has its own defects, such as relying on experiments to determine the parameters of the network. Therefore, it is necessary to conduct more in-depth theoretical research on the selection of network parameters in the future.
Footnotes
Handling Editor: James Baldwin
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
