Abstract
In the absence of a priori knowledge, manual feature selection is too blind to find the sensitive features which can effectively classify the different fault features. And it is difficult to obtain a large number of typical fault samples in practice to train the intelligent classifier. A novel intelligent fault diagnosis method based on feature selection and deep learning is proposed for rotating machine mechanical in the paper. In this method, the deep neural network is not only used for feature extraction but also for fault diagnosis. First, the deep neural network 1 is used to extract feature from the spectral signal of the original signal. In addition, the original vibration signal is decomposed to a series of intrinsic mode function components by empirical mode decomposition, and the statistical features of each intrinsic mode function component are extracted by the deep neural network 2 in time domain and frequency domain. Second, the extraction features of the original signal spectrum and the extraction features of each intrinsic mode function component are evaluated, respectively. After features evaluation, the selected sensitive features are combined together to construct a joint feature. Finally, the joint feature is put into the deep neural network 3 to realize the automatic recognition of different fault states of rotating machinery. The experimental results show that the method proposed in this paper which integrated time-domain, frequency-domain statistical characteristics, empirical mode decomposition, feature selection, and deep learning methods can obtain the fault information in detail and can select sensitive features from a large number of fault features. The method can reduce the network size, improve the mechanical fault diagnosis classification accuracy, and has strong robustness.
Keywords
Introduction
Rotating machinery is one of the most common mechanical equipment and plays an important role in industrial production. 1 In order to improve productivity and product quality, the mechanical equipment must be in trouble-free operation. However, mechanical equipment often operates in complex working environment and mechanical equipment components often fail. Once an equipment component fails, which will produces a chain reaction, mechanical failure of equipment will occur. Fault diagnosis technology is an important technical means to avoid mechanical failure, which is of great practical significance to the normal operation of the mechanical equipment.2,3
During the past decades, researchers have given considerable attention to rotating machinery fault diagnosis, and many intelligent fault diagnosis approaches have been proposed in literature.4–8 Samanta 4 extracted time-domain features and employed three optimized neural networks to detect pump faults. Samanta and Nataraj 5 utilized time-domain features to characterize the bearing health conditions and employed artificial neural networks and support vector machine (SVM) to diagnose faults of bearings. Tran et al. 6 calculated features from thermal imaging based on bi-dimensional empirical mode decomposition (EMD), and then input selected features into relevance vector machine (RVM) for fault classification. Widodo et al. 7 calculated statistical features from the measured signals and carried out RVM and SVM to diagnose the bearing faults. Bin et al. 8 utilized wavelet packets-EMD to extract fault feature. Deep neural network (DNN) is a kind of burgeoning intelligent classifier in recent years.9–16 Wang et al. 9 studied a novel convolutional neural network-based fault recognition method via image fusion of multi-vibration signals. Huang et al. 10 proposed a novel method called deep decoupling convolutional neural network for intelligent compound fault diagnosis. Kane and Andhare 11 used artificial intelligence techniques to identify gearbox condition in the above environment by using psychoacoustic features to replace human hearing. Chen et al. 12 studied fault diagnosis of wind turbine gearbox based on wavelet neural network. Jia et al. 13 used DNN to extract fault feature from the massive data. Gan et al. 14 constructed a hierarchical diagnosis network based on DNN and then applied it in the fault pattern recognition of rolling element bearings. DNN has been applied in the field of fault diagnosis of mechanical equipment.15,16 The DNN is first pretrained by an unsupervised layer-by-layer learning and then fine-tuned with the back propagation (BP) algorithm, where the unsupervised process helps the fault characteristic mining and the supervised process contributes to construct the discriminative fault characteristics for classification. DNN pretraining uses a large number of unlabeled samples for unsupervised training. Only a small number of labeled samples need to be entered in the fine-tuning stage to fine-tune the parameters of the DNN. So the DNN solves the problem that it is difficult to obtain a large number of typical fault samples in reality.
For realizing a large number of typical fault sample diagnoses, Silva et al. 17 proposed feature selection method based on the several feature selection methods. Lei et al. 18 presented a fault diagnosis method of rotating machinery based on a new clustering algorithm using a compensation distance evaluation technique, and the diagnosis result demonstrated the superior effectiveness and practicability of the algorithm. Yang and co-workers19,20 applied the feature evaluation technology to the equipment intelligent fault diagnosis and the result proved the validity of feature selection. Sawhney and Jeyasurya 21 combined with two kinds of feature selection techniques and the former neural network for power system stability assessment, the experimental test has achieved effective results.
The common intelligent fault diagnosis methods generally include two main steps: fault feature extracting using signal processing methods and fault classification using classifiers. At the same time the common intelligent fault diagnosis methods have two obvious deficiencies: (1) the extracted features not only contain a large number of fault features, but also contain some redundant features. These redundant features will not contribute to the classification and even reduce the accuracy of classification. Thus, it is necessary to select the features before classification so as to screen out the sensitive features, thus reducing the network size, improving the accuracy of the classification of the classifier. (2) Supervised classifiers such as SVM22–24 and back propagation neural network (BPNN)25,26 require a large amount of labeled data to be trained to adjust the parameters. But it is difficult to obtain a large number of typical fault samples in reality. Therefore, an unsupervised classifier with high accuracy and robustness or a classifier that only needs a small amount of labeled data to train is needed.
This paper proposes a novel intelligent diagnosis method to overcome the two deficiencies of the common intelligent fault diagnosis methods in fault diagnosis of rotating machinery. The novel intelligent diagnosis method combines feature selection and DNN method. The feature selection method is based on the distance between the features to select the sensitive features. Its selection principle is that the distance between the same classes is the smallest and the distance between the different classes is the largest. Through the feature selection, it can select the sensitive features which are good at recognizing the different fault states. The nonsensitive features to the classification accuracy are removed. Therefore, the scale of the neural network is reduced and the classification accuracy improved. First, the original features are preselected, and then the original strong feature technology is used to select the final strong feature. The comparison results show that the proposed method not only can improve the accuracy but also save running time.
The rest of this paper is organized as follows: the next section briefly introduces the feature extraction method and the burgeoning feature extraction method based on the DNN. “Rotating machinery fault diagnosis process of the proposed method” section presents the proposed method for the rotating machinery fault diagnosis and its process. “Fault diagnosis using the proposed method” section describes two diagnosis cases of rolling element bearings and gearboxes to validate the proposed method, respectively. Conclusions are drawn in the final section.
Feature extraction, feature selection, and state recognition
Fault diagnosis is essentially a pattern recognition problem, which includes three parts: signal acquisition, feature extraction, and state recognition. Signal acquisition is the premise of fault diagnosis, feature extraction is the key of fault diagnosis, and state recognition is the core of fault diagnosis.27–29 Statistical features in the time domain include maximum value, absolute mean, peak to peak value, root mean square (RMS), skewness, kurtosis, shape factor, crest factor, impulse factor, and margin factor. The frequency spectrum of the vibration signal indicates the amplitude change of vibration signal with frequency distribution. When the frequency components of the signal increase or decrease, the energy distribution of the spectrum is shown to be dispersed or concentrated. The characteristic parameters provide effective information of fault diagnosis for rotating machinery through spectrum analysis. Common statistical features are the center gravity of frequency domain and frequency domain variance. Fast Fourier transform (FFT) frequency spectrum analysis method is generally used to select the characteristic frequency for state monitoring and fault diagnosis of rotating machinery.
Feature extraction based on DNN with EMD
In 2006, Hinton et al. 30 discussed the deep learning theory. Deep learning is designed to simulate the learning process of the brain, combined with a large number of training data to study the features of the data. Deep learning neural network can not only extract features but also recognize different states. In the process of condition monitoring and fault diagnosis of rotating machinery, there are still a lot of nonlinear and nonstationary signals. EMD is an effective nonstationary and nonlinear signal processing technique based on the local characteristic time scale of the signal.
This paper proposed a novel feature extraction method based on DNN with EMD. The DNN is composed of a plurality of denoising automatic encoders (DAEs). Figure 1 shows the structure of the DAE. DAE is the three layers of unsupervised neural network, which is divided into two parts: encoding network and decoding network. The input and output layers in denoising auto-encoder have the same number of nodes. High dimension of input data can be reduced by encoding network and reconstructing the coding vector of low-dimensional space to the original input data. Coding network introduces the noise with some statistical characteristics into the sample data, and then codes the sample; the decoding network is estimated in the original form of the sample without noise interference according to the data subject to noise interference. So that DAE can learn from the noisy samples to extract more robust features and reduce the sensitivity of DAE to small random perturbations. The DAE principle is similar to the human body’s sensory system, such as observing objects. DAE can effectively reduce the influence of random factors on the extraction of health information and improve the robustness of the feature expression by adding noise to the coding and reconstruction. Between the coding layer and the decoding layer is the hidden layer; it also called the “code word layer,” which is the core of the whole DAE. The low-dimensional coding vector of the hidden layer can reflect the nature of the high-dimensional input data. In other words, low-dimensional coding is a characteristic expression of high-dimensional inputs from another aspect.

Architectural graph of the denoising auto-encoder.
At the same time, the vibration signals of rotating machinery are decomposed by EMD, and the statistical characteristics of the time domain and frequency domain of the front eight intrinsic mode function (IMF) components are obtained. The time-domain feature parameters are the maximum value, absolute mean and peak to peak, RMS, skewness, kurtosis, shape factor, crest factor, impulse factor, and margin factor; frequency-domain parameters are the center gravity of frequency domain and frequency domain variance.
Feature selection
The paper studied the feature selection method based on the distance between the features to select the sensitive features. The feature selection principle is that the distance between the same classes is the smallest and the distance between the different classes is the largest. Features that are consistent with the principles are considered sensitive features. That is to say, smaller the distance between the same classes and larger the distance between the different classes are more sensitive to this feature. The steps of the selection method are as follows:
Step 1: Calculate the inner distance of the i feature of the j class
Calculate the average value of the inner distance of the j class of the i feature
Step 2: Calculate the average value of the i feature of the N sample of the j class
Then, calculate the average value of the inter-distance of the M class of the i feature
Step 3: Calculate the evaluation factor of the i feature
The size of
Step 4: Sort the features from large to small in accordance with the value of
State recognition
In this paper, DNN is used as the state recognition classifier. DNN has a strong ability of nonlinear expression and good ability to judge. When building a model, there is no need to train the model with a large number of labeled data. This paper uses the stack denoising auto-encoder neural network. The following focuses on the working principle of the denoising auto-encoder and the pretraining and fine-tuning of DNN.
DAE working principle
The structure of the denoising auto-encoder is shown in Figure 1. The following focuses on the working principle of the denoising auto-encoder. As shown in Figure 2, given a label free training sample set

Schematic of a denoising auto-encoder.
The decoding network transforms each
DAE completes the training of the whole network by minimizing the reconstruction error between
Pretraining and fine-tuning of DNN
DNN is first trained by the unsupervised method, which can help the DNN to capture the fault characteristics of the input signal effectively. Then, DNN is adjusted by the supervised learning method to optimize the expression of DNN on the fault characteristics. This paper uses denoising auto-encoder as a non-unsupervised algorithm for pretraining and uses the BP algorithm as the supervised algorithm for fine-tuning stage.
Pretraining is an unsupervised learning from bottom to top. First, the first layer is trained with the training sample data, and the connection weights and the bias parameters of the first layer are obtained. By the principle of denoising auto-encoder, DAE model can learn the structure of the data itself and identify the characteristics of the input. The output of the first layer is used as the input of the second layer. Train the second layer to get the connection weights and bias parameters of the second layer, and so on. The final features are reconstructed based on the learning of multiple layers.
Fine-tuning is supervised learning from top to bottom. With the labeled training data, the error transports from top to bottom to fine-tune the DNN. This process is a supervised training process. Specific as shown in Figure 3, this method uses the BP algorithm to fine-tune the DNN parameters. The output of DNN is expressed as

Pretraining and fine-tuning of the DNN.
Assume that
Rotating machinery fault diagnosis process of the proposed method
The proposed method, respectively, extracts fault features of vibration signal of rotating machinery from two angles: Transform the vibration signal with the FFT, and then put the frequency spectrum of the signal into the DNN1 to extract features; decompose the vibration signal with the EMD method, then calculate each IMF component’s time-domain and frequency-domain statistical features, put the statistical features into the DNN2 to extract further features. Estimate the two extracted features, respectively, and select the sensitive features. Put the sensitive features into the DNN3 to recognize and diagnosis of fault state. The DNN1 and the DNN2 mainly use the denoising auto-encoder to extract features to reduce the DNN3’s size. The DNN3 mainly achieves classification function. The flow chart is shown in Figure 4 Rolling bearing test platform of West University is shown in Figure 5.

Flow chart of rotating machinery fault diagnosis based on feature selection and deep learning. DNN: deep neural network; EMD: empirical mode decomposition; FFT: Fast Fourier transform

Rolling bearing test platform of West University.
The specific steps are as follows:
Step 1. Signal acquisition: Get the vibration signal of rotating machinery under different conditions Step 2. Feature extraction: Transform the vibration signal in each state with the FFT and then put the obtained spectral signal into the DNN1 to extract features. The extracted features are called feature set 1. Decompose each state of the vibration signal with the EMD method to get a number of IMF components. Calculate the statistical features of the time domain and frequency domain of each IMF component. Then put the statistical features into the DNN2 to extract further features. The further extracted features are called feature set 2. Step 3. Feature selection: Feature selection based on distance estimation algorithm is used to feature set 1 and feature set 2, respectively. Then pick out the sensitive features to form the feature set 3. Step 4. Condition recognition: Put the selected sensitive Feature Set 3 into the DNN3 to intelligent fault diagnosis of rotating machinery.
Fault diagnosis using the proposed method
Rolling element bearings and gears are the key components in rotating machinery.31,32 The health conditions of these components often affect the performance, reliability, and service life of the machinery. Whereas due to tough working environment, these components easily suffer from different kinds of damage leading to breakdowns and heavy economic losses.33,34 In this section, two diagnosis cases of rolling element bearings and gearboxes are used to validate the proposed method, respectively.
Case 1: Fault diagnosis of rolling element bearings
Data description
The experimental parameters are provided in Table 1. And the ball pass frequency on outer race (BPFO) and the ball pass frequency on inner race (BPFI) are calculated with shaft speed
Experimental parameters.
The bearing data used here are provided by the Case Western Reserve University. 35 The data were collected from a motor driving mechanical system under 3 HP load loads with the sampling frequency of 48 kHz. The bearing dataset was obtained from the experimental system: (1) under normal condition, (2) with outer race fault, (3) with inner race fault, and (4) with roller fault. The faults were introduced into the drive end bearing of the motor with fault diameters of 0.18, 0.36, and 0.54 mm, respectively. So this test has 10 different working conditions. For each working condition, we divide the collected data into 120 samples and each sample has 4000 sample points. For each working condition, we randomly select 60 groups of samples as the training samples, and use the remaining 60 groups of samples as test samples, as shown in Table 2.
The detail lists of the test and training samples.
Experimental configurations and diagnosis results
The DNN input layer is determined by the number of input feature vectors, and the output layer is determined by the classification label. Because the time-domain sampling number of each sample is 4000 points, the spectrum of the Fourier transform is symmetrical, so the first 2000 sampling points of the spectrum are taken as the input vector of DNN1. In this paper, the network structure of DNN1 is set to 2000–500–300–200–10. Decompose the selected samples with the EMD method. Because the main fault information is concentrated in the front few IMF components, so this paper selected the front eight IMF components. Calculate the time-domain and frequency-domain statistical features of each IMF component (8 × 12 = 96), and put them into the DNN2 as the input feature vectors. The network structure of DNN2 is set to 96–60–30–10. The 200 features which are extracted by the DNN1 third hidden layer are called feature set 1; the 30 features which are extracted by the DNN2 second hidden layer are called feature set 2.
Then we evaluate the feature set 1 and the feature set 2 with the distance estimation algorithm. The evaluation factor of each feature of feature set 1 and feature set 2 is shown in Figure 6. Sort the features from large to small in accordance with the value of

(a) Evaluation factors of 200 features of feature set 1 and (b) evaluation factors of 30 features of feature set 2.
From Figure 7, we know the classification accuracy is the highest when the input is the front 50 sensitive features and the front 10 sensitive features. So we choose the 50 sensitive features as the feature set 1s and the 10 sensitive features as the feature set 2s. Feature set 1s and feature set 2s constitute the joint feature which is called the feature set 3. Then put the feature set 3 into the DNN3 for classification. The network structure of DNN3 is set to 60–40–30–10. The feature set should be normalized before put into the DNN, because the classification accuracy can be improved and the calculation time can be reduced by normalizing. Put the different input feature sets into the DNN to obtain the classification accuracy, respectively. Repeat the test five times to get the average classification accuracy. The results are shown in Table 3.

(a) The relationship between the classification accuracy and the number of input features of the feature set 1 and (b) the relationship between the classification accuracy and the number of input features of the feature set 2.
Fault diagnosis results of different input feature sets.
Based on the results in Table 3, it is found that the feature set 1 is classified by the DNN, and the accuracy is 82.67%. The feature set 2 is classified by the DNN, and the accuracy is 78%. The feature set 1s is classified by the DNN, and the accuracy is 92.33%. The feature set 2s is classified by the DNN, and the accuracy is 89.17%. Feature set 1 and feature set 2 not only contain a large number of fault features, but also contain a large number of redundant features. If they are directly put into the classifier, the classification process will be slow, and the classification accuracy will be reduced. Therefore, feature selection should be performed before classification so as to screen out the sensitive features, thus reducing the network size, improving the accuracy of the classification of the DNN. Feature set 1s and feature set 2s are fault features of rotating machinery vibration signals which are extracted from two angles, respectively. And different input feature sets usually show complementary classification characteristics. So this paper combines feature set 1s and feature set 2s to form the joint feature set 3. And then put the feature set 3 into the DNN3 classification, the accuracy is 98.5%. Compared with the feature set 2s and feature set 1s, the accuracy is improved.
Case 2: Fault diagnosis of gearboxes
Experiments and data description
The gearbox data used here are provided by the multi-stage gear transmission system experimental bench which is shown in Figure 8. The test bench can simulate a variety of gearbox fault, such as gear wear, gear broken teeth, gear shaving, tooth root crack and eccentric gear, etc. Experimental parameters are shown in Table 4. Six operating states of the test gear are shown in Table 5. The original vibration signal is measured by using the acceleration sensor under six operating states of the test gear. And the time-domain signal and the frequency-domain signal of the original vibration are shown in Figure 9.

Multi-stage gear transmission system experimental bench.
Experimental parameters of the multi-stage gear transmission system experimental bench.

Time-domain waveform and frequency spectrum of the original vibration signal.
Six operating states of the test gear.
Experimental configurations and diagnosis results
The DNN input layer is determined by the number of input feature vectors, and the output layer is determined by the classification label. As the original vibration signal spectrum has 350 points and the state label is 6, the network structure of DNN1 is set to 350–250–150–6. Decompose the selected samples with the EMD method and select the front eight IMF components. Calculate the time-domain and frequency-domain statistical features of each IMF component (8 × 12 = 96). And the network structure of DNN2 is set to 96–60–30–6. The 150 features which are extracted by the DNN1 second hidden layer are called feature set 1; the 30 features which are extracted by the DNN2 second hidden layer are called feature set 2. Evaluate the feature set 1 and the feature set 2 with the distance estimation algorithm. Then we choose the 42 sensitive features as the feature set 1s and the 11 sensitive features as the feature set 2s. Feature set 1s and feature set 2s constitute the joint feature which is called the feature set 3. Then put the feature set 3 into the DNN3 for classification. The network structure of DNN3 is set to 53–40–30–6. Put the different input feature sets into the DNN to obtain the classification accuracy, respectively. Repeat the test five times to get the average classification accuracy. The results are shown in Table 6.
Fault diagnosis results of different input feature sets.
In order to verify the superiority of the DNN, different classifiers are used to compare the classification accuracy. Put feature set 3 into BPNN and SVM, respectively, and the test was carried out for 15 times. The structure of BPNN is set to 53–40–30–10. SVM uses the radial basis kernel function, its kernel parameter and penalty factor are optimized by the particle swarm optimization algorithm. Calculate the average of the 15 test accuracy and the corresponding standard deviation, the result is shown in Table 7.
Fault diagnosis results of different classifiers.
BPNN: back propagation neural network; DNN: deep neural network; SVM: support vector machine.
Compared with the three classifiers, the accuracy of the DNN is the highest, and the stability of the diagnosis is the best. The principle of BPNN is different from that of DNN. DNN trains the input feature vector layer upon layer. So the BPNN training errors are sometimes convergent, and sometimes fall into the local optimum, which leads to the accurate rate of diagnosis fluctuating violently. SVM classifier can be accurate and stable for the diagnosis. But SVM cannot reduce input feature vector dimension for further feature learning as DNN. Compared with DNN, SVM has worse generalization ability.
The SVM is a better method to realize the structural risk minimization principle, which provides a new way of thinking to solve the small sample size classification, nonlinear problem. The basic idea of SVM is to increase dimension and linearization: define the optimal linear hyper plane, and reduce the algorithm of finding the optimal linear hyper plane to a convex programming problem. Then, based on the Mercer kernel expansion law, the sample space is mapped to a high-dimensional and even infinite dimensional feature space by nonlinear mapping, so as to the linear learning machine can be used in the feature space to solve the problem of highly nonlinear classification and regression in the sample space. Suppose a training set with m samples

Optimal hyper plane of the SVM.
Define two standard hyper planes
Discussion
EMD is an adaptive signal processing method, which is good at dealing with nonlinear and nonstationary signal processing. The time-domain and frequency-domain statistical features of each IMF component can effectively characterize the characteristics of the rotating machinery under different fault states. Using feature extraction method one can get many fault features of the signal, but also get a lot of useless features. Through the feature selection, it can select the sensitive features which are good at recognizing the different fault states. The nonsensitive feature to the classification accuracy was removed. Therefore, the scale of the network is reduced and the classification accuracy improved. DNN pretraining uses a large number of unlabeled samples for unsupervised training. Only a small number of labeled samples need to be entered in the fine-tuning stage to fine-tune the parameters of the DNN. So the DNN solves the problem that it is difficult to obtain a large number of typical fault samples in reality. Compared with BPNN, DNN uses the denoising auto-encoder to train the network layer by layer. So the connection weights and bias parameters are closer to the global optimum, which avoids the problem that BPNN is easy to fall into the local optimal solution. Therefore, the classification accuracy is higher. Compared with SVM, DNN has a multi-layer network structure, which can be used to reduce the dimension of the input feature, that is the advantage of feature learning. So the DNN has the advantage in the generalization performance.
Conclusions
This paper proposed a novel intelligent fault diagnosis of rotating machine mechanical method based on feature selection and deep learning. The effectiveness of the proposed method is verified using the datasets from rolling element bearings and gearboxes. To deal with the nonstationary property of fault vibration signals, EMD is employed to provide representative features. The results show that the time-domain and frequency-domain statistical features of each IMF component can effectively characterize the characteristics of the rotating machinery under different fault states. To reduce the scale of the network and improve the classification accuracy, the feature selection method is employed to select the sensitive features. Compared with the SVM and the BPNN, the accuracy of the DNN is the highest, and the stability of the diagnosis is the best. DNN is not only used for feature extraction but also for fault diagnosis. And the DNN has the advantage that only a small number of labeled samples needed to be entered in the fine-tuning stage to fine-tune the parameters of the DNN. This method is of great significance to the intelligent fault diagnosis.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (Grant No. 51475407, 51875500), Natural Science Foundation of Hebei Province of China No. E2015203190, and Key project of Natural Science Research in Colleges and Universities of Hebei Province of China No. ZD2015050.
