A Siamese CNN-BiLSTM-based method for unbalance few-shot fault diagnosis of rolling bearings

Abstract

Small and imbalanced fault samples have a profound impact on the diagnostic performance of a model in the process of locating and quantifying the rolling bearing damage of aeroengines in practice. Therefore, a Siamese Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) model was proposed in this paper. Random selection and cross combination methods were used to augment and balance sample sizes at first. Then, two weight-sharing CNN-BiLSTM models were used for adaptive extraction and distance measurement of weak fault features. Finally, the fault classification was performed based on feature distance. Model performance was verified using simulated fault test data of rolling bearings. The results showed that the Siamese CNN-BiLSTM model could achieve an accuracy of up to 96.0% for quantitative diagnosis and 98.0% for location diagnosis. This model was also capable of solving the imbalanced classification of samples and made it possible to transfer between different rotating speeds and working conditions.

Keywords

Rolling bearing Siamese CNN-BiLSTM model small and imbalanced samples fault diagnosis

Introduction

Rolling bearings have a significant influence on the safety in use, service life, and reliability of rotary machines including aeroengines. Surface damage is one major failure form of rolling bearings, and the damage site and area directly reflect the operating state of rolling bearings, further affecting the reliability of aeroengines. Among the many commonly used methods for monitoring the condition of aircraft engine rolling bearings, vibration analysis¹ has been widely used due to its advantages of simple measurement, high accuracy, and low cost. Thus, accurate evaluation of the size and location of rolling bearing damage based on vibration monitoring data has important implications for early diagnosis of bearing faults and prediction of the remaining useful life of bearings.

Conventional vibration signal analysis methods² and machine learning methods³ require a large quantity of priori knowledge, and are influenced greatly by noise in practice. Thus, these methods have weak robustness and poor generalization ability. Due to great superiority in adaptive extraction of features, deep learning has become a research hotpot in recent years, attracting wide attention and leading to extensive discussion.^4,5

However, rolling bearing data collected from aeroengines during actual operation are mostly normal, and the scarcity and imbalanced distribution of fault samples greatly lower the performance of deep learning models. How to locate and quantify faults with small and imbalanced samples and improve the generalization performance of models in complex working conditions and a changing rotating speed is of great engineering significance to the detection and prediction of rolling bearing faults.^6,7

A series of studies have been proposed to address the problem of imbalanced diagnosis and few samples: Hu et al.⁸ used order tracking and resampling methods to process bearing data at different speeds, and the average accuracy was much better than traditional SVM methods under six cross working conditions with few samples. Li et al.⁹ proposed an autoencoder embedded dictionary learning approach for nonlinear industrial process fault diagnosis, which outperforms several dictionary learning approaches and some other nonlinear fault diagnosis methods. Berenji et al.¹⁰ trained an autoencoder with unlabeled samples, and applied a contrastive learning-based post-training to make use of limited available labeled samples to improve the rolling bearing feature set discriminability.

However, most existing deep learning methods for fault diagnosis only focus on prediction accuracy without considering the limitation of both small and imbalanced samples of rolling bearing. Besides, specialized small sample learning networks lack reasonable embedding modules and have certain shortcomings such as insufficient feature extraction ability.

The recently emerging meta-learning method^11,12 guides the learning of new tasks by previous knowledge and experience. This method is aimed at teaching the model how to learn, namely, to catch the essential features of data though comparing different small sample data and thus to acquire the most information with the least memory. Many studies^13,14 on meta-learning show that meta-learning models have strong generalization ability, high accuracy and good robustness for small-sample and even single-sample learning problems.

In meta-learning, metric learning-based SiameseNets (SNs)¹⁵ extract features with two weight-sharing networks of the same structure and define the classification criteria based on distance metrics. As a useful meta-learning network with high performance, SNs are extensively applied in the fields of target tracking^16,17 and machine translation,¹⁸ etc., but there are few studies on their application in the estimation of surface damage of aircraft engine rolling bearings.

In the location and quantitative detection of rolling bearing damage, SNs have several advantages. Firstly, SNs not only extract the features from input samples, but also find similarities between two samples by calculating their distance. With a unique structure, SNs are able to highly accurately diagnose faults with sample samples and show good generalization performance. Secondly, SNs are trained with sample pairs obtained by random selection and cross combination methods, capable of augmenting data and balancing the size of samples. SNs provide solutions for the common problem of small and imbalanced samples in fault diagnosis.

If a convolutional neural network (CNN) module is embedded in Siamese networks, it can extract the spatial features from rolling bearing vibration signals, but cannot take the temporal features of data into account. A Long Short-Term Memory (LSTM) network can model original sequence data and extract temporal features, but it ignores the multidimensional features of vibration data. Therefore, in this study, CNN and bidirectional Long Short-Term Memory (BiLSTM) models were combined to constitute a feature extractor in the Siamese network. This combined model can simultaneously extract spatial and temporal features of rolling bearing faults and improve the diagnostic accuracy of the network.

Taken above, a Siamese CNN-BiLSTM that inherited both the advantages of a Siamese network in fault detection with small and imbalanced samples and the advantages of a CNN-BiLSTM in feature extraction was proposed in this study. The model proposed could locate and quantify the rolling bearing damage with small and imbalanced samples and conduct transfer learning in a changing rotating speed and complex working conditions. The generalization performance of the Siamese CNN-BiLSTM was also verified. The specific process of the paper is shown in Figure 1:

Figure 1.

The organization of the paper.

The contribution of the method to the fault diagnosis of rolling bearings is as follows:

(1) The Siamese network architectural pattern can effectively solve the problem of small and unbalanced fault samples in the process of damage location and quantitative identification of the actual aeroengine rolling bearing, and improve the diagnosis effect of the model.

(2) The combination of meta learning and metric learning can achieve diagnosis between different rotating speeds under complex working conditions of rolling bearings, and improve the generalization ability of the model.

(3) The traditional model has a high dependence on prior knowledge of rolling bearings and insufficient feature extraction ability. However, the combination of CNN and BiLSTM can fully extract the multidimensional and temporal features of rolling bearing vibration signals to better learn the advanced features of different fault types of signals.

Siamese CNN-BiLSTM network model

Siamese net

SNs are a neural network structure for small sample learning. There are often two inputs, which are encoded by two weight-sharing neural networks of the same structure. The similarity between two inputs is output. SN-based rolling bearing fault diagnosis methods use offline-trained models to classify a new unknown fault by the similarity. In the diagnosis process, the model does not need to be updated in real time, so it is able to diagnose faults at fast speed and in real time. In addition, SNs learn by finding the similarities between two inputs with the help of two identical structures, generate labels and measure the similarities between two inputs by selecting a positive sample and a negative sample and combining them to form a sample pair. Therefore, SNs have the special advantage in small and imbalanced sample learning.

The learning principles of SNs are as follows. Firstly, two samples $x_{1}$ and $x_{2}$ are randomly chosen from various samples to be learned as two inputs of the SN. The samples of the same type are labeled as 1, and those of different types are labeled as 0. Then, two weight-sharing neutral networks extract features, yielding feature vectors $G_{w} (x_{1})$ and $G_{w} (x_{2})$ . At last, the SN provides two outputs with an energy function to measure their similarities. In the training process, the SN continues to optimize the loss function, thus increasing the similarities between samples of the same type and reducing the similarities between samples of different types.

The contrastive loss function was used in this study, and it is defined as:

L (Y, x_{1}, x_{2}) = \frac{1}{2 N} \sum_{n = 1}^{N} Y E^{2} + (1 - Y) max {(margin - E, 0)}^{2}

(1)

where Y is label, with a value of 1 or 0, margin is similarity threshold which is set to 1 in this paper, and E is energy function. Euclidean distance was as the energy function in this study, and it is defined as:

E (x_{1,} x_{2}) = ‖ x_{1,} x_{2} ‖_{2} = {(\sum_{i = 1}^{P} {(x_{1}^{i} - x_{2}^{i})}^{2})}^{\frac{1}{2}}

(2)

where P is the characteristic dimension.

According to Formula (1) and (2), when the samples are of different types, the energy function has a larger value, and the loss function has a smaller value, indicating these two functions can accurately describe the similarity between samples and facilitate feature extraction from samples.

Convolutional BiLSTM network model

In order to take both deep multidimensional features and temporal features of rolling bearing fault signals into consideration, the CNN-BiLSTM model was used as the feature extractor of the SN. This model is constituted by connecting a deep CNN and a BiLSTM in a series. It is able to learn deep features and temporal dynamic information of input original vibration signals of rolling bearings. The CNN can adaptively capture spatial features of bearing faults from original signals and reduce redundant data. The BiLSTM network is responsible for extracting temporal features from data. The combination of these two networks facilitates the full extraction of fault features in the circumstance of sample samples.¹⁹

CNN

CNNs are a multilayer perceptron neural network, which can mine richer and deeper data information through weight-sharing convolution. As the most commonly used method in deep learning, deep CNNs have the advantage of automatically learning more abstract features from data.²⁰ The deep CNN constructed in this paper comprises an input layer, convolutional layers, pooling layers, a dropout layer, batch normalization (BN) layers, and a fully connected layer. The working principles and parameters of each layer are as follows:

(1) Convolutional layer. Convolutional layers extract features from input images, conduct convolution computation through several convolutional kernels and input matrices, and obtain feature vectors for feature extraction through an activation function. There are three convolutional layers, each of which contains 16, 32, and 32 kernels. The activation function is ReLU. The calculation process of convolutional layers is as follows:

X_{j}^{l} = f (\sum_{i \in M_{j}} X_{i}^{l - 1} \cdot ω_{ij}^{l} + b_{j}^{l})

(3)

where $X_{j}^{l}$ is the $j$ th element at the $l$ th layer of the convolutional layer, $M_{j}$ is the $j$ th convolutional area at the $l - 1$ th layer, $ω_{ij}^{l}$ is weight matrix, $b_{j}^{l}$ is bias, and f is activation function.

(2) Pooling layer. The pooling layer is used to reduce the dimension of input features, so as to improve computational speed and reduce the chance of overfitting. Three 2 * 2 average pooling layers are stacked alternately with three convolutional layers in order to reduce the dimension of data and extract features. The calculation process of pooling layers is as follows:

X_{j}^{l} = f (β_{j}^{l} \cdot d (X_{j}^{l - 1}) + b_{j}^{l})

(4)

where $β$ is weight matrix and $d (\cdot)$ is downsampling function.

(3) Dropout layer. Neurons are randomly set to zero in a ratio of 0.2, thus preventing network overfitting and improving the generalization performance of the model.²¹ The computational process of the dropout layer is:

y = f (W_{x}) \cdot m, m ~ Bernoulli (p)

(5)

where x. y are respectively the input and output of the layer, W is weight matrix, m is the dropout mask of this layer, and the probability of each element in the mask being 1 is p, which follows the Bernoulli distribution.

(4) BN layer. A BN layer is added after the input layer and before the dropout layer, so as to accelerate network training, prevent gradient explosion or disappearance, reduce the chance of overfitting, as well as avoid the variance shift problem caused by the joint use of the dropout layer and BN layer.²²

(5) Fully connected layer. After feature extraction by the above-mentioned layers, the fully connected layer classifies features.

BiLSTM network

Although CNNs can extract abstract features of rolling bearing failures, CNNs ignore the temporal relationship between data points when they extract features from the vibration signals of bearings, which are one-dimensional temporal signals, leading to fault feature information loss in the circumstance of small samples. Thus, in this paper, the LSTM network was used to extract temporal relationships between fault features. Meanwhile, in order to take into account both forward and backward information of vibration data of rolling bearings and to improve the LSTM network’s ability to get information extract backwards, the BiLSTM network was established, which is composed of two LSTM layers of opposite directions. The forward propagation layer and the backward propagation layer propagate layer-by-layer starting from the first and last segments of the sequence, respectively. Both of them are coupled to the same output layer and share common weights, and they ultimately synchronously process the two results obtained. The BiLSTM network can integrate past and future information to further relieve information forgetting and improve prediction accuracy.

Calculations in the LSTM unit are as follows:

\begin{matrix} f_{t} = σ (W_{f} \cdot (h_{t - 1}, x_{k}^{(t)}) + b_{f}) \\ i_{t} = σ (W_{i} \cdot (h_{t - 1}, x_{k}^{(t)}) + b_{i}) \\ {\tilde{c}}_{t} = \tanh (W_{c} \cdot (h_{t - 1}, x_{k}^{(t)}) + b_{c}) \\ c_{t} = f_{t} c_{t - 1} + i_{t} {\tilde{c}}_{t} \\ o_{t} = σ (W_{o} \cdot (h_{t - 1}, x_{k}^{(t)}) \cdot x_{k}^{(t)} + u^{(o)} \cdot h_{t - 1}) \\ h_{t} = o_{t} \tanh (c_{t}) \end{matrix}

(6)

where W and b are the weight matrix and offset vector obtained through model learning, respectively; $x_{k}^{(t)}$ is the input vector at time t; $σ (\cdot)$ and tanh $(\cdot)$ are two activation functions. The LSTM network updates the hidden layer state h_t_-1 and cell state c_t_-1 at t-1 through the unit constituted by a forgetting gate $f_{t}$ , an input gate i_t, and an output gate o_t, to obtain the current cell state c_t based on the candidate value ${\tilde{c}}_{t}$ , and then the current hidden layer state h_t of the cell. The current cell state and hidden layer state are sequentially delivered to the next unit.

In this paper, the CNN model was combined with the BiLSTM model. Firstly, the vibration signals of each bearing was input into the CNN model for two-dimensional feature extraction. Then, the outputs were transmitted as unit time steps to the LSTM model for temporal feature extraction. To achieve this process, the entire CNN network was enclosed in a time distribution layer so that it could be used for multiple times and deliver successively a range of extract image features to the LSTM model. The structure of the LSTM model is shown in Figure 2.

Figure 2.

BiLSTM network structure.

Siamese CNN-BiLSTM model-based fault diagnosis of rolling bearings

Cross augmentation of data samples

The currently used methods for increasing sample size include Data Augmentation (DA) and Generative Adversarial Networks (GAN), etc.^23,24 DA cannot fundamentally change the dependence on big data of the model, and is prone to generate invalid samples that are considered no difference by the network. As for GAN, the training process of which is difficult to synchronize the balance of two adversarial networks, which can easily lead to instability in the training process. In addition, GAN generates samples with a crash pattern, which makes it easy to generate meaningless samples with little difference. The combination of sample pairs achieves maximum utilization of a small number of samples, avoiding the interference of false and invalid samples on the network.

SNs select samples to form sample pairs, which are taken as the training set. Samples of the same type are labeled as positive samples, and samples of different types are labeled as negative samples. The similarities between two inputs are measured with the contrastive loss function. For a “n-way k-shot”²⁵ problem, a total of ${nA}_{k}^{2}$ positive sample pairs and $A_{k}^{2} C_{k}^{1} C_{k}^{1}$ negative sample pairs can be acquired through random selection and combination, as calculated based on permutations and combinations, where A is the number of arrangements, C is the number of combinations. The original sample size is augmented by nk-1 fold, which not only improves the utilization of samples, but also reduces the dependence on training samples, greatly increasing the accuracy of diagnosis with small samples.

Siamese CNN-BiLSTM model

To fully extract the spatial and temporal features of rolling bearing faults and quantify and locate rolling bearing damage with small and imbalanced samples, the Siamese CNN-BiLSTM model comprising two identical CNN-BiLSTM subnetworks was developed in this study. Especially, the parameter selection of CNN plays a crucial role in the performance of the model. A deeper model means better non-linear expression ability, which can fit more complex feature inputs. However, excessively deep networks may lead to gradient instability, network degradation, etc., resulting in a decrease in model performance. In order to find the most suitable network construction way, the convolutional-pooling layer is set as the basic nonlinear transformation module. By gradually deepening this module, the loss accuracy and training time of the network on the validation set are examined, and the optimal combination way is selected accordingly. The impact of different nonlinear module numbers on network loss and training speed on two sets of bearing datasets is shown in Figure 3.

Figure 3.

The Influence of the number of groups of convolutional-pooling layers on network performance.

From Figure 3, it can be seen that as the group of convolutional-pooling layers increases, the loss of the network on the validation set decreases significantly at the beginning, reaching its lowest point at three groups. Then, as the network depth further increases, due to the overfitting coursed by gradient instability, the loss of the network gradually increases, and the complexity of the network leads to a significant increase in training time. Therefore, the paper selects three sets of nonlinear modules to form the main part of the network, which develops a convolutional network model consisting of input layer, three convolutional layers, three pooling layers, dropout layer (p = 0.2), BN layer, fully connected layer, and regression layer. The specific internal parameters are obtained through controlling variables and comparative analysis, that is, the selected parameters minimize the training loss value of the network. The network parameters are shown in Table 1.

Table 1.

Network parameters of CNN-BiLSTM.

Layer	Parameter	Step	Output	Padding
Input layer	25 × 18 × 1	/	25 × 18 × 1	/
Convolutional layer 1	3 × 3 × 16	1 × 1	25 × 18 × 16	Same
Pooling layer 1	2 × 2	2 × 2	12 × 4 × 16	0
Convolutional layer 2	3 × 3 × 32	1 × 1	12 × 4 × 32	Same
Pooling layer 2	2 × 2	2 × 2	6 × 2 × 32	0
Convolutional layer 3	3 × 3 × 32	1 × 1	6 × 2 × 32	Same
Pooling layer 3	2 × 2	2 × 2	3 × 1 × 32	0
BiLSTM layer	100 × 2	/	1	/
Fully connected layer	1	/	1×1×1	/
Rregression layer	1	/	1	/

The pre-processed original signals are input into the CNN in the form of 25 × 18 × 1 matrix graphs. After three times of convolution and three times of pooling, features are unfolded and flattened, and then input into the BiLSTM network containing 200 nodes. Finally, the fully connected layer maps feature into feature vectors, and the probability of two features being similar is obtained by the activation function. Rolling bearing damage is located and quantified by comparing if the unknown sample is similar to the known sample. Feature extraction process is shown in Figure 4.

Figure 4.

Siamese CNN-BiLSTM model-based feature extraction process.

Diagnosis process

The rolling bearing damage detection based on the Siamese CNN-BiLSTM model includes three steps below.

(1) Pre-treatment. The original vibration signal of the rolling bearing is cut into several parts according to impact peaks, and 450 points near each peak are taken as a training sample. Multiple samples are obtained from the data of each type of fault after going through the complete vibration cycle.

(2) Model training. Two samples are randomly selected from known fault samples to form a sample pair $(x_{1}, x_{2})$ , which is then input into the model for feature extraction. Feature distance $E (x_{1}, x_{2})$ between the two inputs is calculated by Formula (2) to determine whether they belong to the same type. The loss $L (Y, x_{1}, x_{2})$ of the L-labeled sample pair is calculated by Formula (1), and the Adam function is used for iteration and optimization to reduce the loss.

(3) Unknown fault diagnosis. Samples are selected from known fault samples to form supportive sets $S = {(x_{1}, y_{1}), \dots (x_{k}, y_{k})}$ , which are combined with unknown fault samples $\hat{x}$ successively to produce sample pairs. The sample pairs are then input into the Siamese CNN-BiLSTM network, and the probability that the current supportive set and tested set belong to the same class is calculated. The fault type with the largest probability of being similar is selected according to Formula (7) as the type of the current test sample.

C (\hat{x}, S) = argmax (P (\hat{x}, x_{c})), x_{c} \in S

(7)

For an N-class classification problem, N times of testing are required, and the supportive set is $S_{n} = (S_{1}, \dots S_{N})$ . N similarity values of each class are summed, and the class with the largest sum is regarded as the final class of the tested sample (Formula (8)).

C (\hat{x}, (S_{1}, \dots S_{N})) = argmax (\sum_{n = 1}^{N} P (\hat{x}, x_{cn})), x_{cn} \in S_{n}

(8)

Rolling bearing fault simulation testing

Test equipment

The test equipment used in this study is an aeroengine “rotor-rolling bearing-casing” tester, which is manufactured by a ratio of 1:3 based on a real engine model. Its overall and internal structures are shown in Figure 5. The tester has a structure similar to that of a real aeroengine. It has the same external casing as that of a real engine, but the internal structure is simplified so that effective rolling bearing vibration signals can be acquired.

Figure 5.

Aeroengine rotor tester: (a) overall structure and (b) internal structure.

The bearing used in the experiment is an HR6206 single row deep groove ball bearing. Pits were made on the outer race, inner race, and ball of the bearing through wire electrical discharge machining, so as to simulate faults at different sites. Meanwhile, several pits of different sizes were made on the outer raceway to simulate different sizes of faults. The picture of the rolling bearing is shown in Figure 6, and its parameters are presented in Table 2.

Figure 6.

Picture of the rolling bearing after faults are made: (a) outer race fault, (b) inner race fault, and (c) ball fault.

Table 2.

Basic dimensional parameters of the 6206 bearing.

Type	Inner race diameter	Outer race diameter	Ball diameter	Pitch diameter	Ball No.	Contact angle
HRB 6206	30 mm	62 mm	9.5 mm	46 mm	9	0°

The simulation experiment is divided into two parts. On the one hand, a normal bearing, a bearing with faulty inner race, a bearing with faulty outer race, and a bearing with faulty ball were put inside the rotor tester with casing. Vibration acceleration sensors were arranged on the bearing seat and in the horizontal direction of the casing. Acceleration sensors (B&K 4805) and data acquisition boards (NI USB9234) were used to collect vibration signals and the data of damage at different sites on the rolling bearing. On the other hand, nine different sizes of penetrating grooves were made on the outer raceway of the rolling bearing by wire electrical discharge machining to simulate different sizes of spalling damage. The bearing seat was provided with B&K 4805 sensors for collection of signals and data of different sizes of damage on the rolling bearing. The sampling frequency in the experiment is 10 kHz. The experiment scheme is shown in Table 3.

Table 3.

Rolling bearing fault simulation testing scheme.

No.	Rotating speed (r/min)	State of the rolling bearing	Detected point
1	1500	Normal Inner race failure Outer race failure Ball failure	Horizontal direction of casing Bearing seat
2	1500 2400 3000	Outer race failure: 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2 and 2.4 mm	Bearing seat

Experimental data

Vibration acceleration signals of rolling bearings were collected following the above-mentioned experimental scheme. Time-domain waveforms of faults of different sizes and at different locations are shown in Figures 7 and 8.

Figure 7.

Time-domain waveforms of faults of different sizes.

Figure 8.

Time-domain waveforms of faults at different locations.

According to Figures 7 and 8,

(1) it is impossible to determine the fault size of the rolling bearing only based on the amplitude of vibration impacts, and corresponding feature extraction models are needed to extract feature information from the temporal data in the time-domain waveforms of impact signals.

(2) Signals from the bearing seat show notable fault impact features with a high amplitude, while the impact features in signals from the casing are masked by noise, which are not evident, with an extremely low amplitude. Taken above, conventional signal processing methods cannot extract directly fault size and location information of bearings. A deep feature extraction model is needed to extract deep features hidden behind the vibration time-domain waveform and noise.

Siamese CNN-BiLSTM model-based quantitative diagnosis of rolling bearing damage

The identification of different damage sizes for rolling bearings is actually a process of considering the changes in damage over time. The trained model can identify specific damage sizes based on a small number of samples, which essence is to monitor the evolution state. In this paper, the Siamese CNN-BiLSTM model was trained with nine different sizes of faults on the rolling bearing. The samples were randomly selected and combined in order to balance and augment data. Two samples of the same class were labeled as a positive sample pair and two samples of different types were labeled as a negative sample pair. After model training, the test data and known data were input into the model as sample pairs, and whether they belong to the same class was determined based on the distance metric. In this way, the failure size was determined.

Data pre-treatment

The original vibration signals collected from the rolling bearing were pre-processed, eliminating the influence of noise while saving the essential information of fault features. This pre-treatment helps improve diagnosis accuracy. It is necessary to analyze the law and characteristics of impact signals caused by damage on the rolling bearing surface on the mechanism level. According to the studies by Randall and Sawalhi, the interval between the two impact peaks from the ball entering to leaving the spalling area is proportional to the fault size, and it can be used as a measure of the fault size.^26,27 The principle is shown in Figure 9.

Figure 9.

Vibration effect caused by the ball from entering to leaving the spalling area.

During data pre-treatment, vibration signals containing fault information should be saved. In this study, the peak value of the vibration signal generated by each impact was taken as the origin, and 150 points before the origin and 300 points after the origin were chosen for continuous interleaved sampling to form a 25 * 18 data matrix as a learning sample. Multiple learning samples were obtained by going through all the impact cycles of each time sequence (Figure 10). This pre-treatment method segments the original signal without doing unnecessary processing, capable of saving fault information in the vibration signal as much as possible while preventing interference by noise.

Figure 10.

Original signal segmenting and continuous interleaved sampling.

Attention mechanism

The attention mechanism in the Transformer model²⁸ was introduced in this paper to promote the capture of relationship features between temporal signals. This mechanism can assign a weight to the input by itself, thereby enabling the model to focus on the essential information of rolling bearing faults and improving the efficiency of feature extraction. The process is as follows:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(9)

where Q, K, and V are the tested matrix, key matrix, and input data matrix, respectively; T is the time step of the input matrix, and N is the number of variables.

The attention mechanism was introduced to capture data features in this paper. The output of the last convolutional layer was activated and then up-sampled to the original image size to yield the attention activation region (Figure 11). The intensity of the color represents the size of the attention weight. Areas with a higher weight are paid more attention by the network. It can be seen from the figure that the network puts more emphasis on the area near the vibration peak, which reflects the damage size. The finding demonstrates that the network can learn the essential features of faults.

Figure 11.

Attention mechanism: (a) original data matrix and (b) attention heatmap.

Diagnosis results

To show the classification effect vividly, the T-distributed stochastic neighbor embedding (T-SNE) method was used to visualize the features extracted from the network. This method maps each data point to the corresponding probability distribution through mapping transformation, thereby reducing the dimension of data and visualizing them.²⁹ The results are shown in Figure 12.

Figure 12.

Dimension reduction results of nine-class features of the SN.

As shown in Figure 12,

(1) the Siamese CNN-BiLSTM model can well classify data of the same type into one group, thus achieving fault diagnosis with small samples.

(2) The features of faults of different sizes after dimension reduction are distributed in a certain pattern. Points with similar sizes are closer in distance, demonstrating that the Siamese CNN-BiLSTM model can classify data based on distance measurement and adaptively learn the damage pattern of rolling bearings.

SNs have special advantages in small sample learning as they are trained with samples obtained by random selection and combination methods and learn similarity metrics. To further investigate the effect of the sample size on the performance of the network, the classification accuracy of the Siamese CNN-BiLSTM model was examined with different sizes of training samples. The number of training samples increases while the number of testing samples remains 100. The results are shown in Figure 13.

Figure 13.

Effects of the sample size on the accuracy of the network (2400 r/min).

It can be seen from Figure 13,

(1) CNN, BiLSTM and Siamese CNN-BiLSTM models all have a low diagnosis accuracy when there are extremely few samples. With the increase in the number of samples, the models get more training and their accuracy increases gradually.

(2) Compared with CNN and BiLSTM models, the Siamese CNN-BiLSTM model has a higher accuracy in various circumstances of small samples, indicating its superiority in diagnosis with small samples.

Transferring between different rotating speeds

As the rotating speed of the rotor changes, the vibration signal of the rolling bearing alters significantly, directly weakening the diagnosis performance of the deep learning model. The generalization performance of models at different rotating speeds (A: 1500 r/min, B: 2400 r/min, and C: 3000 r/min) was studied, and their transfer accuracies between different rotating speeds were analyzed. The number of testing samples is 500. The diagnosis accuracies of models transferring between different rotating speeds are shown in Figure 14.

Figure 14.

Effects of transferring between different rotating speeds on the accuracy of the network.

When the rotating speed begins to change, the diagnosis accuracy of both CNN and BiLSTM models decreases markedly. However, the average diagnosis accuracy of the Siamese CNN-BiLSTM model is 39.98% and 23.21% higher than that of CNN and BiLSTM models, respectively. It proves the good generalization performance of the Siamese CNN-BiLSTM model.

Siamese CNN-BiLSTM model-based location diagnosis of damage

Diagnosis results

The Siamese CNN-BiLSTM model was used to classify four types of faults and locate the fault on the rolling bearing. The classification results after T-SNE visualization are shown in Figure 15. It can be seen that the Siamese CNN-BiLSTM model can better locate the failure on the rolling bearing.

Figure 15.

Dimension reduction results of 4-class features of the SN.

Transferring in complex working conditions

The vibration signals of real aeroengines in service are generally gathered by sensors provided on the casing wall. In experimental environments, however, sensors are often installed on the bearing seat for data collection. There are some differences in feature distribution between the two conditions. Thus, models trained with the data collected from the bearing seat are often not applicable to the analysis of the data collected from the casing wall. It indicates the model lacks the generalization ability. In this study, the Siamese CNN-BiLSTM model was trained with the data collected from the bearing seat and its accuracy was verified with the casing dataset. The accuracy of the Siamese CNN-BiLSTM model was compared with that of conventional single models, and the results are shown in Figure 16.

Figure 16.

Effects of transferring between different working conditions on the accuracy of the network.

According to the comparison results, CNN and BiLSTM models trained with bearing seat signals have relatively lower performance on the casing dataset. In contrast, the Siamese CNN-BiLSTM model has a higher transfer accuracy, demonstrating that the model has better generalization performance and can transfer between different working conditions to some extent.

Siamese CNN-BiLSTM model-based diagnosis with small and imbalanced samples

Due to its unique structure and training method, the SN has special advantages over conventional single deep neural networks in solving the problems of small and imbalanced samples. To demonstrate the superiority of the SN in diagnosis with small and imbalanced samples, a unilateral Siamese CNN-BiLSTM model was employed in this paper, which was trained by traditional loss optimization methods, namely, forward input and reverse iteration. Moreover, the damage diagnosis results of the SN were compared with those of conventional networks based on the data of damage at four different sites on the rolling bearing.

Results of diagnosis with small samples

The Siamese CNN-BiLSTM model and single CNN-BiLSTM model were used to locate the damage at four different sites on the rolling bearing. The training and testing sets contain only 50 samples for each type. The damage location results are compared in Figure 17.

Figure 17.

Results of location diagnosis of damage at four different sites with small samples: (a) Siamese CNN-BiLSTM and (b) CNN-BiLSTM.

As shown in Figure 17, after dimension reduction, the fault features predicted by the single CNN-BiLSTM model show a high overlap ratio, indicating that this model cannot locate the damage at four different sites on the rolling bearing. The reason is that the small number of samples makes it difficult for the model to converge. In other words, the model fails to learn fault features and is thus underfitting. However, after embedding a weight-sharing Siamese structure into the same single CNN-BiLSTM model, a high accuracy was achieved. The reason is that through cross pairing and metric learning, the SN has strong generalization ability even when there is a small number of samples, and meanwhile, the chance of overfitting is greatly reduced.

Results of diagnosis with imbalanced samples

The inputs of common neural networks are various types of untreated samples. In contrast, during the training of the SN, samples are selected from the original data and combined to form Siamese pairs before being input into the embedding module for feature extraction. This process breaks the original classification relationships, allowing the original samples to be presented in the form of new sample pairs. Hence, it balances the number of different types of samples. To sufficiently compare the performance between the SN and common networks in solving the problem of sample imbalance, CNN-BiLSTM and Siamese CNN-BiLSTM models were used to classify the imbalanced data of damage at four different locations. There were 50 normal data, 30 inner and outer ring fault data, and 10 ball fault data. The classification results are shown in Figure 18.

Figure 18.

Results of location diagnosis of damage at four different sites with imbalanced samples: (a) Siamese CNN-BiLSTM and (b) CNN-BiLSTM.

According to Figure 18, different types of samples contribute disproportionally to the gradient of the conventional single model when the number of samples of different types is unequal, and the model pays more attention to the type containing more samples during predication. As a result, the model fails to learn the essential features of the fault. The SN model, however, balances the classes by selecting sample pairs, and thus achieves a higher diagnosis accuracy for imbalanced samples.

Conclusion

A Siamese CNN-BiLSTM model was proposed to locate and quantify the aeroengine rolling bearing damage with small and imbalanced samples. After multiple experimental comparisons, the following conclusions are drawn.

(1) When there are only 100 samples in the training set, the Siamese CNN-BiLSTM model achieves an accuracy of 96.0% for quantifying and 98.0% for locating rolling bearing faults. This model is capable of effectively diagnosing faults with small samples.

(2) The Siamese CNN-BiLSTM model enables the rolling bearing to transfer between different rotating speeds and different working conditions. Compared with conventional single models, the Siamese CNN-BiLSTM model has a high transfer accuracy, demonstrating that it has better generalization performance.

(3) The SN can balance the number of samples through the combination of sample pairs, so it is more accurate in diagnosis than the CNN-BiLSTM model.

The reason for the above results lies in the two major advantages of Siamese network: metric learning ideas and sample pair extraction and concatenation, combined with the superiority of CNN-BiLSTM in feature extraction of rolling bearing vibration time series signals, making it well adapted to the fault diagnosis problem of rolling bearing with a small number of imbalanced samples. However, Siamese network also has certain limitations. Firstly, although it can achieve data expansion through sample recombination and comparison, at the same time, the way of sequential comparison also reduces recognition speed; In addition, we have demonstrated through experiments that the Siamese network has higher accuracy in the aspect of unbalanced fault diagnosis with few samples compared with the ordinary network, but it can be seen from the text that the fault diagnosis accuracy is only about 80% when the number of samples is relatively small, which has improved the diagnosis of actual rolling bearing faults, but still insufficient enough. Further research and exploration are needed on how to improve the diagnostic ability of small sample networks and make them more suitable for real service conditions.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is sponsored by National Science and Technology Major Project of China (J2019-IV-004-0071) and National Natural Science Foundation of China (52272436).

ORCID iD

Xiyang Liu

References

Tao

. Fault diagnosis of rolling bearing based on resonance-based sparse signal decomposition with optimal Q-factor. Meas Control 2019; 52(7-8): 1111–1121.

Ren

Kang

, et al. Application of continuous potential function stochastic resonance in early fault diagnosis of rolling bearings. Meas Control 2020; 53(5-6): 767–777.

Chen

, et al. Rolling bearing intelligent fault diagnosis method based on IPSO-WCNN. Meas Control 2023; 56(3-4): 681–893.

Wang

Cheng

. A combination of residual and long–short-term memory networks for bearing fault diagnosis based on time-series model analysis. Meas Sci Technol 2021; 32(1): 015904.

Khan

Kim

Choo

. Intelligent fault detection using raw vibration signals via dilated convolutional neural networks. J Supercomput 2020; 76(10): 8086–8100.

Zhang

. Data privacy preserving federated transfer learning in machinery fault diagnostics using prior distributions. Struct Health Monit 2022; 21(4): 1329–1344.

Zhang

. Federated transfer learning for intelligent fault diagnostics using deep adversarial networks with data privacy. IEEE/ASME Trans Mechatron 2022; 27(1): 430–439.

Tang

Lin

, et al. A simple data augmentation algorithm and a self-adaptive convolutional architecture for few-shot fault diagnosis under different working conditions. Measurement 2020; 156: 1–17.

Chai

Yin

. Autoencoder embedded dictionary learning for nonlinear industrial process fault diagnosis. J Process Control 2021; 101: 24–34.

10.

Berenji

Taghiyarrenani

Rohani Bastami

. Fault identification with limited labeled data. J Vib Control. Epub ahead of print 17 march 2023. DOI: 10.1177/10775463231164445

11.

Tian

Zhao

Huang

. Meta-learning approaches for learning-to-learn in deep learning: A survey. Neurocomputing 2022; 494: 203–223.

12.

Drigas

Mitsea

Skianis

. Meta-learning: a nine-layer model based on metacognition and smart technologies. Sustainability 2023; 15(2): 1668.

13.

Jomaa

Schmidt-Thieme

Grabocka

. Dataset2Vec: learning dataset meta-features. Data Min Knowl Discov 2021; 35(3): 964–985.

14.

Zheng

Liu

Yin

. Research on image classification method based on improved multi-scale relational network. PeerJ Comput Sci 2021; 7: e613.

15.

Bromley

Bentz

Bottou

, et al. Signature verification using a Siamese time delay neural network. Int J Pattern Recognit Artif Intell 1993; 7(4): 669–688.

16.

Ondrasovic

Tarabek

. Siamese visual object tracking: a survey. IEEE Access 2021; 9: 1–24.

17.

Ntwari

Park

Shin

, et al. SNS-CF: Siamese network with spatially semantic correlation features for object tracking. Sensors 2020; 20(17): 4881.

18.

Dinh

Thanh

. Vietnamese sentence paraphrase identification using pre-trained model and Linguistic Knowledge. Int J Adv Comput Sci Appl 2021; 12: 796–806.

19.

Liu

Chen

Hao

, et al. A combined deep learning model for damage size estimation of rolling bearing. Int J Engine Res 2023; 24(4): 1362–1373.

20.

Pham

Kim

. Rolling bearing fault diagnosis based on improved GAN and 2-D representation of acoustic emission signals. IEEE Access 2022; 10: 78056–78069.

21.

Krestinskaya

James

. Analogue neuro-memristive convolutional dropout nets. Proc R Soc A Math Phys Eng Sci 2020; 476(2242): 1–20.

22.

Xiang

Shuo

Xiao

, et al. Understanding the disharmony between dropout and batch normalization by variance shift, arXiv, 2018: 9.

23.

Cheng

Zhang

, et al. Dynamic Mosaic algorithm for data augmentation. Math Biosci Eng 2023; 20(4): 7193–7216.

24.

Goodfellow

Pouget-Abadie

Mirza

, et al. Generative adversarial nets. In: Proceedings of the 2014 Conference on Advances in Neural Information Processing Systems 27. Montreal, Canada: Curran Associates, Inc., 2014, pp.2672–2680.

25.

Chiu

Feng

, et al. Few-Shot named entity recognition via meta-Learning. IEEE Trans Knowl Data Eng 2022; 34(9): 4245–4256.

26.

Randall

(ed.). Vibration-based condition monitoring: industrial, aerospace and automotive applications. 2nd ed. New York, NY: John Wiley&Sons, 2021. pp.365.

27.

Sawalhi

Randall

. Vibration response of spalled rolling element bearings: observations, simulations and signal processing techniques to track the spall size. Mech Syst Signal Process 2011; 25(3): 846–870.

28.

Lan

Zhang

, et al. Short-term traffic flow prediction based on the optimization study of initial weights of the attention mechanism. Sustainability 2023; 15(2): 1374.

29.

Eshghi

Au-Yeung

Takahashi

, et al. Quantitative comparison of conventional and t-SNE-guided gating analyses. Front Immunol 2019; 10: 1194.