Abstract
As one of the key parts of rotary machine, the fault diagnosis and running condition monitoring of rolling bearings are of great importance for normal working and safe production of rotary machine. However, the traditional diagnosis approaches merely count on artificial feature extraction and domain expertise. Meanwhile, the existing convolutional neural networks (CNNs) have the problem of low fault recognition rates. This paper proposes a novel convolutional neural network with one-dimensional structure (ODCNN) for the automatical fault diagnosis of rolling bearings, which adopts six sets of convolutional and max-pooling layers to extract signal features and applies a flattening convolutional layer followed by two fully-connected layers for feature classification. The architectures of one-dimensional LeNet-5, AlexNet, and the proposed ODCNN are illustrated in detail, followed by the obtaining of training and testing samples, which is pre-processed by overlapping the vibration signals of rolling bearings. Finally, the classification experiment is carried out. The experimental results show that the ODCNN has higher fault diagnosis rates and can achieve high accuracy with load variant. Additionally, the extracted features of three CNNs are visualized, which illustrate that the new CNN has a better classification capacity.
Keywords
Introduction
As a pivotal part of mechanical equipment, the running and health status of rolling bearings has a vital impact on the performance of machines. 1 However, the working environment of rolling bearings is often accompanied by high temperature, high running speed, and complex alternating external loads, which makes them very vulnerable to failure. 2 Therefore, the fault diagnosis and condition monitoring of rolling bearings are of great significance to ensure the normal operation and safe production.
The procedure of traditional fault diagnosis ways mainly contains the extraction, selection, and classification of features from the bearing vibration signals. 3 In the stage of feature extraction, the features that related to fault characteristics is extracted from original vibration signals for subsequent fault recognition. Correspondingly, the features of rolling bearings vibration signals can be extracted from frequency domain, time domain, and time-frequency domain. 4 The statistical approaches 5 such as Root Mean Square (RMS) 6 and Kernel Density Estimation (KDE) 7 are usually used in time domain; the Fast Fourier Transform (FFT) is often used for frequency domain analysis 8 ; while for the time-frequency analysis, the most popular ways are Short-Time Fourier Transform (STFT) method, 9 Wavelet Packet Transform (WPT), 10 Empirical Mode Decomposition (EMD), 11 and its variations. 12 After feature extraction, the feature selection has to be carried out to remove the insensitive and useless features. The commonly used methods include Principal Component Analysis (PCA) 13 and Independent Component Analysis (ICA). 14 Finally, in the step of feature classification, the Artificial Neural Network (ANN) algorithms, 15 k-Nearest Neighbor (kNN) method, 16 and Support Vector Machine (SVM) approach 17 are often employed to realize the classification of fault types. The traditional artificial approaches have been widely used for fault diagnosis of rolling bearings. However, there are several issues on the application of these methods: (1) the extraction of fault features is difficult and the classification accuracy depends on the signal processing techniques, relying on cumbersome artificial extraction and solid domain expertise 18 ; (2) the coupling relationship among signal pretreatment, feature extraction and fault classification in the procedure of fault diagnosis is destroyed by human isolation, resulting in the loss of fault information partially. 19
Contrast to traditional fault diagnostic methods, the deep learning method is a developing artificial intelligence technique, which can directly learn the diagnostic information in the vibration signals without any tedious denoising preprocessing and artificially extracting features. By using the structure of deep convolution neural network (CNN), the deep learning scheme is an end-to-end diagnostic method, which can combine feature extraction and feature classification into one step. Wen 20 discussed a novel CNN which extends from the LeNet-5 network structure and converted the vibration signals to 2D images as input to extract the features of rolling bearings. Hoang 21 transformed the vibration signals of bearings into gray-scale images and used them as input data of the proposed CNN to detect bearing faults, and finally concluded that this approach can achieve very high accuracy and robustness under noisy environments. However, the bearing vibration data collected by accelerometer are only related to one-dimensional time, and data points at each time have spatio-temporal correlation. If they are directly converted into two-dimensional form, the spatial correlation in the original vibration signals will be destroyed, which may lead to the loss of information related to the faults. 22 Therefore, some researchers attempt to construct one-dimensional CNNs, and try to use the original raw vibration signals as the input of CNNs directly. Janssens 23 proposed a shallow CNN model, which only consists of one convolutional layer with wide kernels followed by a fully connected layer, to monitor the rolling bearing health condition. In order to realize the intelligent fault diagnosis of rolling bearings, Sadoushi 24 presented a novel CNN that the signal processing is treated as the first layer, and the performance comparative study among deep learning-based methods, this new CNN and traditional machine learning is implemented. Zhang 25 proposed a 5-layer CNN to detect faults of rolling bearings, in which kernels in the first convolutional layer are wide while in the following layers are narrow. However, these approaches do not suitable for the condition of load variant and the fault recognition rates are not gratifying.
In order to further improve the fault diagnosis ability of rolling bearings, this paper investigate a new one-dimensional convolutional neural network (ODCNN). The rest of this paper is organized as follows. The net architectures and parameters of one-dimensional LeNet-5 (1D-LeNet-5), one-dimensional AlexNet (1D-AlexNet), and the proposed ODCNN is presented in Section 2. The experimental apparatus is introduced and the vibration signals of rolling bearings are pre-processed by overlapping segments to achieve samples for training and testing in Section 3. Finally, the classification accuracy is calculated and feature visualization technique is carried out to compare the classification capacity of three approaches before conclusions are drawn in Section 5.
Theory of CNN
The classical CNN consists of Convolutional layer, pooling layer and fully connected layer. In this section, the theory of the CNN is introduced in details.
Convolutional layer
The convolution layer is the core structure of CNN and its function is to extract different features of signal through different observation modes (also called convolution kernel) to realize the observation of specific mode of input signal. In order to extract different features from the input signal, a convolution layer usually has several convolution kernels, and different convolution kernels are used for convolution operation. Since the same convolution kernel shares parameters in the process of convolution, a convolution kernel learns a class of features called feature map. The convolution operation is defined as follows:
where
In order to avoid the problem of insufficient expression ability of linear model, the activation function is used for nonlinear transformation to filter the features obtained from the convolution operation. The commonly used sigmoid and tanh activation functions in the traditional neural network appear gradient disappearance phenomenon easily, resulting in the structure of network cannot be deepened. In recent years, the unsaturated nonlinear function Relu (rectified linear units) has been widely applied into the CNN as activation function, which has faster convergence speed than the traditional saturated nonlinear function when the training gradient drops. The expression of Relu activation function is as follows
where
Pooling layer
The purpose of pooling layer is to reduce the number of neurons in the network through pooling operation. After pooling operation, the number of connections between convolution layers is reduced and the calculation speed is accelerated. The commonly used pooling methods are maximum pooling (taking the point with the maximum value in the local acceptance domain) and average pooling (averaging all values in the local acceptance domain). They can be defined as follows
where,
Fully-connected layer
The fully-connected layers are the final layers of CNN. They are located behind the pooling layers, and each neuron is fully connected with all the neurons in the previous layers. The function of the fully-connected layer is to map the multi-dimensional features obtained from convolution and pooling operations to the sample space and complete the final classification. As the dimension of the input characteristic graph has been greatly reduced after several times of convolution and pooling operations, the fully-connected layers do not increase too much computation time. The sigmoid function is used for the two classification problems, and the Softmax function can be used for the
The corresponding loss function can be defined as
where,
Network architecture
In this section, the network architecture and parameters of 1D-LeNet-5, 1D-AlexNet, and the proposed ODCNN are introduced in details.
1D-LeNet-5 architecture
The structure of 1D-LeNet-5 is shown in Figure 1. It consists of two convolutional layers, two average-pooling layers, followed by three fully-connected layers. The function of convolutional and average layers are extracting features of input signals, while the fully-connected layers are used to classify the extracted features. 26

The architecture of 1D-LeNet-5.
The first layer of 1D-LeNet-5 is the input layer, which is a convolutional layer with filters having size 50 and a stride of 1. The next layer is an average-pooling layer (also called sub-sampling layer) with the kernel size 2 and stride of 1. The similar convolutional and average-pooling layers are followed in layers 3 and 4, and parameters of these two layers are same to the former two layers. The output of 1D-LeNet-5 consists of three fully-connected layers, the units of two former layers (layers 5 and 6) are 200, and all of these units in the sixth layer is connected to all nodes in the fifth layer. The last layer is a Softmax classifier with 10 units which is used to recognize the labels from 0 to 9. The detail structural parameters of 1D-LeNet-5 are given in Table 1.
Parameters of 1D-LeNet-5.
1D-AlexNet architecture
The layers of 1D-AlexNet 27 are much larger and deeper than 1D-LeNet-5, which contains five convolutional layers, three max pooling layers and three fully connected layers.
The convolutional layer 1 (kernel size 50, stride 4) and 2 (kernel size 50, stride 1) connect the max-pooling layers with kernel size 3 and stride of 2. Convolutional layers 3, 4, and 5 are connected directly with kernel size 2 and a stride of 1. Then, the fifth convolutional layer is followed by a max-pooling layer with kernel size 3 and a stride of 2. The output of 1D-AlexNet consists of three fully-connected layers, the units of two former layers (layers 9 and 10) are 200, and all of these units in the 10th layer is connected to all nodes in the ninth layer. The last layer is a Softmax classifier with 10 units which is used to recognize the labels from 0 to 9. The structure of 1D-AlexNet is shown in Figure 2 and the detail structural parameters of 1D-AlexNet are listed in Table 2.

The architecture of 1D-AlexNet.
Parameters of 1D-AlexNet.
ODCNN architecture
The architecture of our presented ODCNN is illustrated in Figure 3, which is deeper than traditional CNNs. It consists of six groups of convolutional and max-pooling layers, followed by three fully-connected layers. In this network, the max-pooling layer is introduced repeatedly to enhance the classification capability and to improve the robustness of extracted features. Compared with the 1D-LeNet-5 and 1D-AlexNet Networks, the proposed ODCNN can increase the accuracy significantly.

The architecture of ODCNN.
The first layer of ODCNN is the input layer, which is a convolutional layer with filters having size 50 and a stride of 4. A max-pooling layer with the kernel size 3 and stride of 2 is followed. The similar convolutional and max-pooling layers are followed in layers 3 and 4, parameters of layer 3 with the kernel size 50 and stride of 1, and parameters of layer 4 with the kernel size 3 and stride of 2. Then, two sets of convolutional and max-pooling layers are followed in layers 5 to 8, where parameters of layers 5 and 7 with the kernel size 5 and stride of 1, and parameters of layers 6 and 8 with the kernel size 3 and stride of 2. Subsequently, two groups of convolutional and max-pooling layers are followed in layers 9 to 12, where parameters of layers 9 and 11 with the kernel size 2 and stride of 1, and parameters of layers 10 and 12 with the kernel size 2 and stride of 2. The output of ODCNN also consists of three fully-connected layers, the units of two former layers (layers 13 and 14) are 200, and all of these units in the 14th layer is connected to all nodes in the 13th layer. The last layer is a Softmax classifier with 10 units which is used to recognize the labels from 0 to 9. In order to effectively suppress over-fitting phenomena and improve the capability of this network, the dropout algorithm is used in fully connected layers, and the parameters are set as 0.5. The corresponding structural parameters of ODCNN are given in Table 3.
Parameters of proposed ODCNN.
Case study
In this section, the experimental apparatus used to obtain the rolling bearings vibration signals of normal and different fault types is depicted. In addition, the data processing technology is presented in detail.
Description of the experimental apparatus
In order to make a comparative study on the capability of three models, the vibration data provided in the Open Bearing Database of Case Western Reserve University (CWRU) are applied for analysis, the corresponding experimental platform is shown in Figure 4. 28

Experimental apparatus.
It consists of a torque transducer, a dynamometer and a power motor. The left-most motor is used to generate driving force, and the right-most dynamometer is used to generate rated loads (0hp, 1hp, 2hP and 3hp), which connected with a mediated torque sensor. The testing rolling bearings support the motor shaft and the basic parameters of the bearing used in the experiment is given in Table 4. Both of the drive and fan ends of the motor housing are attached the accelerometers with the magnetic bases in the vertical direction. Vibration signals are acquired under different working conditions including normal and faulty situations. The electrical discharge machining (EDM) method is applied on the test bearing in the diameters of 7, 14, and 21 mils (1 mil = 0.001 inches) to simulate different fault types of rolling bearings. The vibration signals were collected with the speed of 12kS/s, and the data of drive end bearing faults is collected with the speed of 48kS/s. The system simulates the normal state (NB) and three kinds of fault types, namely, inner ring (IR), rolling balls (RB), and outer ring (OR) faults, and each type of fault consists of three fault sizes (7, 14, and 21 inches, respectively). Therefore, there are 10 kinds of running states of bearings. The description of rolling bearings fault types is given in Table 5. The time-domain vibration signals of rolling bearing for normal state, inner ring, rolling balls, and outer ring faults are illustrated in Figure 5.
Parameters of rolling bearings.
The faults classification of rolling bearings.

Time-domain vibration signals of rolling bearing. (a) Normal state. (b) Rolling balls. (c) Outer ring. (d) Inner ring.
Data augmentation
In order to realize high accuracy fault diagnosis, the number of training samples needs to be large enough. However, the sample size of CWRU Bearing Data center is limited. In this paper, the overlap method is applied to increase the number of training and testing samples. Figure 6 depicts the sketch map of this method, which slices the vibration signals of rolling bearings with overlap. 25

Data argumentation by overlapping.
With the method of overlap, the training and testing samples are obtained.
where ⌊•⌋ is round down operation.
The
where
Dataset A, B, C, and D represent load of 0, 1, 2, and 3hp, respectively. Dataset A contains 400 training samples and 180 testing samples, while for Dataset B, C, and D, each one contains 1000 training and 180 testing samples with 10 different fault types. The details of all the datasets are shown in Table 6.
Bearing datasets.
Validation and analysis
In this section, the classification accuracy and convergence comparisons among the 1D-LeNet-5, 1D-AlexNet, and ODCNN models is carried out using the datasets obtained in Section 3. In addition, the feature visualization for the three CNNs is presented and discussed.
Analysis and discussion
Classification accuracy of different CNNs
Three experiments are implemented: the first two experiments are carried out to determine the training and testing accuracy of 1D-LeNet-5 and 1D-AlexNet using datasets disposed in Section 3.2 (refer to Table 6), while the diagnostic capability of the proposed ODCNN using the same datasets is implemented finally. The accuracy of classification is calculated as follows
where
The training and testing accuracy of three approaches are shown in Tables 7–9. Compared with 1D-LeNet-5 and 1D-AlexNet, the ODCNN proposed in this paper possess better recognition capability. The recognition accuracy of four datasets from high to low are A, B, C, and D. It is presumed that the increase of load leads to the increase of vibration and noise of the system, which will reduce the bearing fault recognition rate. 29
Accuracy of 1D-LeNet-5.
Accuracy of 1D-AlexNet.
Accuracy of ODCNN.
In order to clearly show the fault recognition ability of three CNNs, the confusion matrix is introduced to analyze the prediction results of three models in details. Figure 7 shows the confusion matrixes of three models in the load of 3hp. It can be found that in three models, the diagnostic accuracy of NB, IR07, IR21, RE14, RE21, and OR07 is high, while it is rather poor in the 14 inch of Outer race (OR14). Compared with 1D-LeNet-5 and 1D-AlexNet, the ODCNN increases the diagnostic rate in RE07 significantly.

The confusion matrix of three CNNs. (a) 1D-LeNet-5. (b) 1D-AlexNet. (c) ONCNN.
Convergence analysis
Figure 8 illustrates the iterative process of training and validation accuracy with corresponding loss functions of these algorithms, which can examine the convergence speed of these algorithms. The training and validation accuracy curves in Figure 8 indicate that the three algorithms can reach high accuracy. The training and validation loss curves show that the three algorithms have good convergence performance.

The accuracy and loss function of three different CNNs. (a) 1D-LeNet-5. (b) 1D-AlexNet. (c) ODCNN.
Feature visualization
To demonstrate the effective and feature extraction capability of the presented ODCNN, the t-distributed stochastic neighbor embedding (t-SNE) technique is applied to reduce the dimension of the extracted features for visualization. This method is an efficient nonlinear dimensionality reduction approach that can transform the data in high dimensional space into a lower one for visualization.
Taking the dataset D for example, the two-dimensional visualizations of fault features in three CNNs, which are extracted from the Softmax classifier, are depicted in Figure 9, in which different colors represent different fault types of rolling bearings. In addition, it is noteworthy that the visualization of three CNNs reveals some interesting phenomena. Firstly, the feature discrimination of three CNNs gradually becomes obvious. As shown in Figure 9(a), it is not divisible in 1D-LeNet-5. While in ODCNN, the discrimination of fault features is more significant as shown in Figure 9(c) because layers of three CNNs are deeper and wider. Secondly, Figure 9(a) and (b) reveal that the visualization of RE07 and OR14 have some overlapped regions, indicating that the fault types of RE07 and OR14 may not easy to be discriminated. However, the proposed ODCNN can solve the problems effectively as shown in Figure 9(c).

Feature visualization of three CNNs. (a) 1D-LeNet-5. (b) 1D-AlexNet. (c) ODCNN.
Conclusions
To realize high-precision and intelligent fault diagnosis of rolling bearings, a novel CNN with one-dimensional structure is proposed in this paper. Compared with 1D-LeNet-5 and 1D-AlexNet CNN, the proposed approach is deeper and wider. The feature extraction is realized by six sets of convolutional and max pooling layers, which has a better performance to extract as more features of the rolling bearing vibration signals as possible. The experimental results indicate that the proposed ODCNN possess good classification capacity and sufficient accuracy. The visualization of the feature distribution via t-SNE method indicates that it has better classification performance than traditional CNNs.
Although the proposed approach has above advantages, there is still room for improvement in our future work. In fact, the data filter is very necessary before the feature classification, for the raw bearing vibration data quality significantly affects the accuracy and convergence of classification in practical application. Remarkably, the proposed ODCNN is expected to be widely used in the fault diagnosis of other similar types of one-dimensional time signals, like voice recognition, atrial fibrillation, and rotating machinery vibration.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the followin g financial support for the research, authorship, and/or publication of this article: This work is supported by Natural Science Foundation of Zhejiang Province of China under Grant LQ20E050017, National Key Technologies Research & Development Program of China under Grant 2018YFF0212702, Zhejiang Lab’s International Talent Fund for Young Professionals under Grant ZJ2019JS006 and National Natural Science Foundation of China under Grant 61801454.
