Research on fault diagnosis of rolling bearings in roller-to-roller printing units based on siamese network

Abstract

To realize a fault diagnosis of rolling bearings in a real R2R printing unit, a method based on Siamese Network is proposed in this work. First, vibration signals in rolling bearing were changed into a series of time–frequency spectra with Continuous Wavelet Transform, and thus the frequency components with time in various scales can be reflected as images. Siamese Networks with sub-nets composed of both Convolutional Neural Network (CNN) and Depth-wise Separable Convolution Network (DSCN) were proposed and established for fault diagnosis; meanwhile, fault samples were divided into sample twins to solve the problem of small samples. As to a database of rolling bearings, different kinds of faults with various degree, rotary speed and added noise were distinguished with both SN-CNN and SN-DSCN models successfully. Then an experiment for a R2R unit in printing press is also taken, there are all 7 classes of samples to be identified, and each group contains few numbers of samples. From this work, it can be seen that SN-CNN and SN-DSCN both can realize a fault diagnosis of rolling bearings in printing units based on 20 samples, which can be seen as a limited sample learning mission. Besides, SN-DSCN is proved to have a less time in training process compare to SN-CNN.

Keywords

Siamese network fault diagnosis rolling bearing depth-wise separable convolution network convolutional neural network

Introduction

As an important part of the manufacturing industry, the printing industry has irreplaceable substitutes in the fields of education and scientific research, commodity consumption, cultural communication, packaging and transportation, and financial functions. In recent years, with the rapid development of printing machines and printing materials, a series of new printing machines have shown great potential in many fields, such as flexible electronic device manufacturing, high-precision functional material coating, and functional material surface performance improvement. Although there are now many kinds of different printing machines, a similar basic structure, named as Roller-to-Roller (R2R) Printing Units, can be found in all of them as the core part in printing equipment. As shown in Figure 1, current printing equipment is always composed of one or more R2R units, which can be divided into offset printing units, gravure printing units, flexographic printing units, screen printing units, coating units and many other types. The working principles in different types printing units are basically similar. The pre-printed pattern is made on the printing plate and the printing plate is installed on the surface of the printing plate cylinder. All kinds of inks are uniformly attached to the surface of the printing plate. With the pressure between the impression cylinder and the printing plate cylinder, the ink can be transferred on various substrates as a predetermined pattern and then a printed product can be obtained.

Figure 1.

A series of R2R printing units.

The R2R unit is mainly composed of a large number of high-speed rotating rollers, such as impression cylinders, plate cylinders, rubber rolls, anilox rollers, and ink rollers. Therefore, many rolling bearings are used in this kind of units, and a high-speed and stable production of printing is based on the rolling bearings in a large degree. Due to factors such as high pressure between rollers, high-speed moving, chemical solvents, and other factors involved in printing, the work conditions of bearings are relatively poor. Any weak failure or damage in machine will cause a failure in printed product. Therefore, researches on fault diagnosis in printing plays an important role in the improvement of the reliability for printing.

Recently, the technology of mechanical fault diagnosis is rapidly developing with artificial intelligence. The model of machine learning used in fault diagnosis in early times is always rely on processed signals and extracted features artificially.^1,2 This kind method is depended on humane experience in a large degree. With the development of deep learning, fault diagnosis is making a great progress. Due to the strong abilities in model training and data mining of deep learning, the fault diagnosis can be realized more easily and correctly with much fewer fault samples and little human experience.³

Two steps are always contained in early fault diagnosis models, signal processing and fault classification. In signal processing, many classical methods can be chosen to get effective fault features such as Empirical Mode Decomposition, Wavelet Transform, and Fast Fourier Transform.^4-7 In fault classification, many models with strong nonlinear learning abilities are always used such as Support Vector Machine, Artificial Neural Networks, and Naive Bayes^8-10 These methods in early works are always restrained and affected by empirical interference easily, and they are limited in many areas and working conditions. With the development of deep learning, there is an obvious changing in fault diagnosis. In fault diagnosis, methods based on deep learning mainly always include Convolutional Neural Network, Generative Adversarial Networks, Long Short-term Memory Networks, Deep Belief Networks, and some improved networks based on them.^11-16 In order to improve the speed of fault diagnosis, Shuffle Network and Mobile Network are improved for fault diagnosis.^17,18 Meanwhile, unsupervised methods play more important roles in the diagnosis of rotating machinery under strong noise and varying rotational speeds such as Parallel Sparse Filtering and Auto Encoder.^19-21 New types of networks are also generated for sparse fault samples, such as Caps Network and Prototypical Network.^22,23

As to printing units, there are some special problems such as small number of fault samples, weak fault signals, and strong environmental noise. To realize a fault diagnosis of rolling bearings in a real R2R printing unit, a method based on Wavelet Transform and Siamese Network is proposed in this work.

The main contents are shown in Figure 2 and as following. First, the vibration signal is transformed into a time–frequency image based on CWT to obtain time–frequency information of various faults. Then, the fault diagnosis model of Siamese Network is constructed, and the net is used to amplify the small differences of similar fault samples to realize model training under small samples. In order to simplify the parameters in deep learning model and improve the efficiency of calculation, both Convolutional Neural Network (CNN) and Depth-wise Separable Convolution Network (DSCN) are introduced to reduce the amount of model parameters and calculations with high accuracy. Finally, the effectiveness of the proposed method is verified with a fault database of rolling bearings, and then the method is applied to fault diagnosis of rolling bearings in two different R2R printing units successfully.

Figure 2.

Flowchart of fault diagnosis.

The main novelties and contributions of this paper are summarized as follows.

Two deep learning models based on parallel structure with Convolutional Neural Network and Depth-wise Separable Convolution Network are proposed named as SN-CNN and SN-DSCN.

A method of fault diagnosis based on continuous wavelet transform and proposed models is given, thus advantages in image recognition in CNN and DSCN can be played.

The fault diagnosis for rolling bearing with small samples in R2R printing units is realized with a high accuracy with given methods in experiments.

The remaining parts of this paper are organized as follows. Section 2 reviews methods used in the processing of fault signals. The theoretical background of Continuous Wavelet Transform is described and a detailed calculation is also described in this section. In Section 3, a Siamese Network is established, and then a new structure of Siamese Network for fault diagnosis is proposed and analyzed here. Both DSCN and CNN are introduced in Siamese Network as a sub-net to enhance the ability of training. In Section 4, three groups of experiments were taken to realize the fault diagnosis of rolling bearings, two kinds of samples from real R2R printing units. With the advantages of Siamese Network, a high accuracy is obtained from small fault samples under different noises. The given method is proved to be effective and conclusions are recorded in Section 5.

The processing of fault signals

Continuous wavelet transform is used here to get images from vibration signals, and these time–frequency spectra can be used for model training in CNN models effectively. In our earlier work for fault diagnosis of rolling bearings, CWT got an effective result with CNN models. Therefore, it is chosen here for Siamese Network with CNN structures here.²⁴

Continuous wavelet transform

Fault signals can be analyzed on different scales by basic wavelet functions. The continuous wavelet transform is defined as equation (1). Here $f (x)$ is the signal to be analyzed

W_{f} (a, τ) = \frac{1}{\sqrt{a}} \int_{R} f (x) \bar{ψ (\frac{x - τ}{a}) d x}

(1)

ψ (t)

means a continuous function called the mother wavelet function or a wavelet function. Then a scale transformation should be introduced into

ψ (t)

, and take a as the scale factor of frequency and

τ

as the time shift. Thus,

ψ_{a τ} (x) = 1 / \sqrt{a} ψ (x - τ / a)

is named as a wavelet basis function in CWT. The scale factor will be changed with

a

, a bigger value of

a

always means a wider time scale of time analysis while a narrower scale of frequency analysis.

In CWT, the high-frequency information in the signal usually contains in the fast-changing part of time domain, such as steep leading edges and sharp pulses. As to this part of signal, an enough time resolution is necessary to meet the fast changing in short intervals. In contrast, the low-frequency information contains in the slow-changing part and therefore a higher frequency resolution is required. The mother wavelet function always decays quickly with a finite length, then a series of wavelet basis functions can be obtained from the mother wavelet function with different values of time shift and scale factor.^25,26

Information with different scales from time–frequency relationships can be obtained according to different mother wavelet functions. Considering that Morlet wavelet basis has a strong ability in time–frequency analysis, it is chosen as the wavelet basis in following works.^27,28 The Morlet wavelet basis used here can be expressed as equation (2)

ψ (η) = π^{- 1 / 4} e^{i k η} e^{- η^{2} / 2}

(2)

where the number of oscillations can be controlled and decided by

k

in mother wavelet. Then the processing of fault signals in the rolling bearing from a roller-to-roller unit can be realized by CWT with the equation (2).

Realization of continuous wavelet transform

CWT has a good property of time–frequency localization. With the help of a series of wavelet basis functions, it has unique advantages in the processing of sudden signal.²⁹ A given signal can be analyzed under different times and frequencies scales with CWT, and a time-frequency image can be obtained from the result.

According to the relationship between scale and frequency, a time–frequency image always reflects the change of frequency components with time in various scales. Since different faults will make the distribution of frequency components in signals to be different, and there is also a difference between those time–frequency diagrams. Thus, a problem of fault diagnosis can be seen as a pattern recognition for images. Then deep learning models can be used to realize this image recognition for fault diagnosis.

As in Figure 3, a vibration signal of normal rolling bearings is converted into a time–frequency diagram based on CWT with the Morlet wavelet.

Figure 3.

The transform based on CWT.

The deep network for fault diagnosis

Two deep learning models are proposed named as SN-CNN and SN-DSCN, and a method of fault diagnosis based on continuous wavelet transform is given in this part.

The structure of Siamese Network

Those small differences in fault samples can be amplified in the Siamese Network, and this property is very useful in the early bearing fault diagnosis. Siamese neural networks employ a unique structure to naturally rank similarity between inputs. In general, SN is composed of two sub-networks. The unique network consists of these twin networks which accept distinct inputs but are joined by an energy function at the top. This function computes some metric between the highest-level feature representation on each side. The parameters between the twin networks are tied, thus the structure of network is symmetric.

The structure of Siamese Network used here is shown in Figure 4. Two similar parts of network should be established, and then they can be composed together to form a whole model. In Siamese Networks, the weights can be shared during training and results from two sub-networks can be evaluated by a distance function. The property of a SN is decided by the structure of sub-network in a large degree, and it is important to choose a suitable structure for a specific work. Compared with a single neural network, the parameters are increased in the Siamese Network due to its structure expansion in the topology, which will increase additional calculations.

Figure 4.

The structure of Siamese Network.

As to the fault diagnosis of rolling bearings in printing units, fault samples from equipment are limited due to the cost of experiment. Considering that the samples are converted into images in part 2, CNN can be chosen since it has a strong ability of recognition for images. However, in the following fault diagnosis of small samples, CNN cannot realize the classification.

Then Siamese Network with CNN model is suitable for this small samples problem, but a more amount of calculation should be solved at first. To improve this problem, a DSCN is then introduced to form the sub-nets. DSCN can be seen as an improvement of CNN, and it can reduce the size and computational cost in some degree. Both structures were considered in following work.

The principle of Siamese Network

As Figure 4, the whole learning process can be divided into four steps, including the input, feature extraction, distance measurement, and fault recognition. Two identical feature extraction sub-networks consist of the whole Siamese network.

Once a pair of samples named ( $X_{1}$ $X_{2}$ ) are put into two feature extraction networks, two outputs of features, $f (x_{1})$ and $f (x_{2})$ , are generated, respectively. Weights $w$ can be shared between two feature extraction networks. Then feature vectors got from twin sub-networks are input into a layer to compute the distance between them. Unlike traditional classification networks, the input for the proposed network is a sample pair which contains two samples from the same or different classes. As to a classification task for n categories, take m training samples for each category as $x_{1}, x_{2}, \dots, x_{m}$ . Here, a sample pair of the same categories can be calculated as equation (3)

T = \frac{n \times m \times (m - 1)}{2}

(3)

And the sample pair from the different categories as equation (4)

T = \frac{m \times (m - 1) \times n \times n}{2}

(4)

Adopting this strategy can significantly increase the amount of data used to train the model and reduce effectively reliance on the fault samples.

In this paper, we adopt L1 distance as our distance matrix. The distance between a sample pair of ( $X_{1}$ $X_{2}$ ) can be calculated from equation (5)

E (f (X_{1}), f (X_{2})) = \exp (- {‖ f (X_{1}) - f (X_{2}) ‖}_{1}) \in [0,1]

(5)

where

E (f (X_{1}), f (X_{2}))

represents the distance between

f (X_{1})

and

f (X_{2})

Depth-wise Separable Convolution Network

DSCN and global max pooling are proposed to reduce the number of model parameters, and the calculation speed can also be improved. A standard convolution can be decomposed into a depth-wise convolution and a pointwise convolution as shown in Figure 5. The effeteness of on reducing the number of parameters with DSCN can be seen by comparing a CNN sub net directly.

Figure 5.

Depth-wise separable convolution.

A CNN layer takes an image with the size of $D_{i} \times D_{i} \times M$ as an input, and then produces output feature maps with a size $D_{o} \times D_{o} \times N$ by convolving with a kernel size of. $D_{k} \times D_{k}$ .³⁰ Thus, the number of parameters for such typical convolution layer can be counted as $D_{k} \times D_{k} \times M \times N \times D_{i} \times D_{i}$ . The number of parameters for such depth separable convolution layer can be calculated as $D_{k} \times D_{k} \times M \times D_{i} \times D_{i} + M \times N \times D_{i} \times D_{i}$ where $D_{i}$ , $D_{i}$ , and M are width, height, and number of input channels, respectively. $D_{o}$ , $D_{o}$ , and N are width, height, and number of output channels, respectively.

Meanwhile, global max pooling is used here. As shown in Figure 6, it computes only a single max value for each input channel. This pooling mode can greatly reduce the number of parameters. Whereas in a CNN model, the fully connected layer has too many parameters, which can easily cause over-fitting. Flatten Layer can be replaced by Global max pooling block, and then the fully connected blocks of CNN can also be improved with less parameters.

Figure 6.

Global max pooling.

In the training process, the weights of networks are needed to be adjusted by a loss function. In this work, the Cross-Entropy Loss is selected and defined as equation (6).

l o s s = - \frac{1}{T} \sum_{t = 1}^{T} [y_{t} \times \log (h_{θ} (x_{t})) + (1 - y_{t}) \times \log (1 - h_{θ} (x_{t}))]

(6)

Here T represents number of training examples, $y_{t}$ represents target label for training example. $x_{t}$ means an input for the training of example t. $h_{θ}$ represents the training model of neural network with weights $θ$ .

Besides a Siamese Network with two sub-nets of CNN (SN-CNN), a model based on Siamese Network with two sub-nets of DSCN is also established (SN-DSCN) for the fault diagnosis in this work. The specific structure is recorded in Figure 7 and main stages are given and labeled. And their time consumptions are also analyzed and compared.

Figure 7.

The SN-DSCN model for fault diagnosis.

Fault diagnosis with SN-DSCN and SN-CNN

Now a method of fault diagnosis based on proposed SN-CNN and SN-DSCN models with CWT. Fault signals can be changed into a series of images by CWT, therefore, advantages in image recognition of CNN and DSCN in can be played fully.

The given method is used for the fault diagnosis of rolling bearings in both a rotor system and a R2R unit. The flow chart of fault diagnosis is shown in Figure 8. To realize the fault diagnosis, training and verifying data should be obtained and changed into frequency spectra to establish the models. Then unknown signals to be identified will be put into trained models to make a classification for fault classes.

Figure 8.

The proposed method for fault diagnosis.

Experiment

In order to verify the effectiveness of SN-DSCN and SN-CNN model proposed in this paper, fault samples from the Bearing Data Center of Case Western Reserve University are used here. Then the fault bearing samples from a printing unit are used to reflect the effect in working.

Bearing Data Center

The experimental equipment is shown in Figure 9. A motor is on the left side, the right side is a dynamometer, and a torque transducer is between them. Motor bearings were seeded with faults using electro-discharge machining. Fault bearings of SKF6205 were reloaded into the test motor, and the vibration signals were recorded with a sampling frequency of 12 Hz. Both normal bearings and fault bearings were taken, three kinds of faults were studied with varying degrees, 0.178 mm, 0.356 mm, 0.533 mm, and 0.711 mm (outer race without 0.711 mm). The damage in outer raceway, inner raceway and rolling element are all considered in experiment.

Figure 9.

The experimental equipment.

Bearing failures under noisy environment

As to bearing data, it is obtained under an environment relatively ideal. In order to simulate a complex industrial environment for fault diagnosis, Gaussian white noise is added into data manually since it is near to the real noise. The bearing data under 0 horse power of motor is selected to add Gaussian white noise with different signal-to-noise ratio (SNR).

Comparison between samples from original signal and modified signal with Gaussian white noise of 2 dB are shown in Figure 10, and the signal of inner raceway is chosen here as an example. It is more confused in noisy signals, since the amplitudes are less obvious than original one. The added noise is also converted into image together with vibration signal and it is a disadvantageous factor for models. Noise with different degrees were combined, including −4 dB, −2 dB, 2 dB, 4 dB, 6 dB, 8 dB, and 10 dB.

Figure 10.

Comparison between inner and noise.

As to each kind of SNR, 12 groups of samples were used, normal state and three kinds of faults samples, contained faults of outer raceway, inner raceway and rolling element with various degree (0.178 mm, 0.356 mm, 0.533 mm, and 0.711 mm, outer race without 0.711 mm), were considered. It should be pointed out that there are only 23 samples for each group and the whole number of samples for each SNR is 276, and they are divided into 207 samples for training and 69 pairs of samples for verifying. This number of samples is relatively small for diagnosis and hard to training a model such a CNN model, still the fault diagnosis can be realized based on SN.

In this part, the number of iterations is set to 20, and the batch size is set to 32. From Figure 11 it can be seen that the accuracy reaches to a high level over 99.5% when the number of iterations is over 20, and the improvement with a bigger value is not obvious here, thus it is set to 20. Batch size is also a key parameter in model training, and an ideal selection is to find a balance between efficiency and capacity in memory. If the batch size is too small, the training data will be difficult to converge and lead to an under fitting. The processing speed will be faster with a bigger value, but the required memory capacity will also increase quickly and the result is more like to get into a local optimum. Besides, GPU plays a better performance with a 2ⁿ value of batch size such as 16, 32, 64, and so on. Considering the time consumption and compute power of the workstation, batch size in this paper is set as 32 at last.^31,32

Figure 11.

The relationship between accuracy and iteration without added noise.

With trained model, 480 samples were taken to realize a fault diagnosis and they are independently from training and verifying data. To make a more specific analysis about diagnosis results, the confusion matrix is used to evaluate the performance in proposed model.^33,34 The confusion matrix can reflect the total result between real class in fact and predicted class in model testing. Matrix with a max SNR of 10 dB was chosen here and shown in Figure 12, which the most difficult condition in Table 1, values in row mean the true labels and values in column mean predicted labels.

Figure 12.

Confusion matrixes under different SNR. (a) SN-CNN (b) SN-DSCN.

Table 1.

Diagnosis results of different SNR.

SNR (dB)	−4 (%)	−2 (%)	0 (%)	2 (%)
SN-CNN	99.20	99.02	100.00	98.28
SN-DSCN	98.75	99.03	100.00	99.60
SNR(dB)	4 (%)	6 (%)	8 (%)	10 (%)
SN-CNN	99.72	99.70	99.86	99.24
SN-DSCN	99.33	98.50	99.06	99.67

In both SN-CNN and SN-DSCN, most of samples are on the diagonal which represent a correct result in diagnosis. SN-DSCN model in Figure 12(b) make a better result with an accuracy of 96.67% and a precision of 97.02%, which are 94.58% and 95.22% in the SN-CNN model, respectively. From confusion matrixes, samples with different degrees of inner faults are difficult to recognize, however inner fault can still be distinguished from other faults.

Bearing failure under variable speed

Fault samples under different speeds were also used. The Results of fault diagnosis under 1 to 3 HP of motor is recorded in Table 2. In this part, 12 groups of samples were used as the same as part 4.1.1 for each horsepower.

Table 2.

Diagnosis results under different rotational speeds.

Horsepower	1 (%)	2 (%)	3 (%)
SN-CNN	100.00	98.83	99.59
SN-DSCN	98.90	98.22	98.40

It can be seen that the accuracy of the SN-CNN is higher than SN-DSCN, but the accuracy in SN-DSCN model is still generally above 98%, which basically realizes the trade-off between speed and accuracy. In total, both them can be proved to be effective with a high accuracy.

Fault diagnosis in offset printing units

Experiment description

Two groups of experiments were finished in two kinds of R2R units from Offset Printing machines (GUANGHUA 605, made in China). The faults were also introduced by an electro-discharge machining. As to the first group of data, NSK 6001Z rolling bearings were mounted on both ends of ink rollers. The bearing fault in a printing press is normally easy to be damaged since the printing process involves a large number of chemical solvents such as ink, alcohol, embossing liquid. The printing pressure also shortens the service life during the contact between rollers.³⁵

The testing system of offset printing is shown as Figure 13, a mobile workstation, a data acquisition device and acceleration sensors were used to obtain vibration signals. The speeds of bearings were 4 rad/s and the sampling frequency was 3200 Hz. Main parameters selected for this experiment are recorded in Table 3. Three fault sizes were chosen, 0.1 mm, 0.3 mm, and 0.5 mm, on inner raceway and outer raceway of bearings. Adding normal sample, there are all 7 groups of samples.

Figure 13.

The testing system of offset printing.

Table 3.

Main parameters in experiment.

Od R (mm)	Id R (mm)	Pitch diameter D (mm)	Ball diameter d (mm)	Ball number Z	Contact angle $α$
28	12	20	4.763	8	$0^{°}$

Fault diagnosis in this work belongs to few-shot learning, which is suitable for network training in the form of a full data set. The direction determined by the full data set can represent the population more well, and the gradient is easier to move to the direction of the extreme value.

In the training process, the learning rate is sometimes automatically modified along with the training process to speed up. 20 samples from each group of samples to form a positive sample pair and a negative sample pair. In order to have a better result, Adam, Adamax, RMSprop, and Nadam optimizer algorithms are taken and compared.

In Figure 14 the accuracy of training and verification based on all optimizers shows an overall increasing trend with the increasement of iterations. Among these algorithms, Nadam optimizer has a higher accuracy and it is chosen in following work. The mutation at 90 times with Nadam optimizer is obviously in Figure 14(b) since the Y-axis is enlarged in some degree. The value of mutation with Nadam is still near about 91%. The mutation is caused by a setting of learning rate and batch size. When the learning rate is too big, the fluctuation will be more obvious since the step in gradient descent is too large. The effect of batch size is hard to describe clearly and a smaller one always leads to a smoother result. Besides, similar mutation can be overcome with enough iteration times. The accuracy can still reach a high accuracy over 99% when the iteration time is enough.

Figure 14.

Influence of different optimizers on accuracy. (a) Accuracy in training data (b) Accuracy in verifying data.

Result with different sample sizes

Besides realizing the fault diagnosis, the influence of sample size is also considered and four sets with different sizes are carried out in this paper. Both kinds of SN models were all used in this part.

Dataset A: Select 8 time–frequency diagrams from each of the 7 types of fault samples, to form 196 pairs of positive samples and 1344 pairs of negative samples.

Dataset B: Select 12 time–frequency diagrams from each of the 7 types of failure samples, to form 462 pairs of positive samples and 3024 pairs of negative samples.

Dataset C: Select 16 time–frequency diagrams from each of the 7 types of fault samples, to form 840 pairs of positive samples and 5376 pairs of negative samples.

Dataset D: Select 20 time–frequency diagrams from each of the 7 types of failure samples, to form 1330 pairs of positive samples and 8400 pairs of negative samples.

The accuracy of bearing fault diagnosis under different sample sizes is shown in Figure 15. There are some obvious trends in Figure 15(a) and (b). First, it can be seen that the accuracy will increase rapidly with iteration times, and then they will keep in a high level of over 95% after 130 iteration times training. Meanwhile, both training and verifying accuracy get to a high value quickly and steadily with more samples such as 16 and 20 samples. When the size is 8 and 12, results fluctuate more frequently.

Figure 15.

Influence of different sample size on accuracy. (a) Accuracy of training data (b) Accuracy in verifying data.

However, proposed method still can reach a high accuracy with small samples, once the iteration times is enough, and can be proved to be effective in experiments. To achieve an ideal result of diagnosis more quickly and steadily, the number of sample size for each kind of fault can be set more than 16 at least.

The trained model based on database D was used to realize a fault diagnosis for 280 unknown samples, contained 7 kinds of status and each has 40 samples. These samples are as twice as training data. The confusion matrix in fault diagnosis is shown in Figure 16. In overall, the accuracy is 94.64% from predicted result, the precision is 95.35% and the recall is 94.64%. Therefore, the model has an ability to identify faults under different degrees. As to inner fault with a size of 0.5 mm, it is sometime classed as normal one and there is a lower accuracy than other samples. Still, the whole accuracy is near to 95% for 7 kinds of samples. Therefore, proposed method can be used in practical applications under limited samples effectively.

Figure 16.

Confusion matrixes under different sample sizes (SN-DSCN).

Comparison of calculation time

To illustrate the advantage in reducing parameters, a comparison is given between SN-DSCN and SN-CNN. The detail parameters in two kinds of models are recorded in Tables 4 and 5. Assume the input image has a size of 64*64 here and make the fully connection layers have the same output.

Table 4.

Parameters in SN-DSCN model.

Layer	Kernel size/number	Output size	Padding	Parameters
Depth-wise 1	(3*3)/64	646464	Same	286
Max pooling 1	(3*3)/64	212164	Valid	0
Depth-wise 2	(3*3)/128	2121128	Same	8960
Max pooling 2	(3*3)/128	77128	Valid	0
Global max pooling	—	128	Valid	0
Fully connection	—	128	—	16,512
25,758

Table 5.

Parameters in SN-CNN model.

Layer	Kernel size/Number	Output size	Padding	Parameters
Conv 1	(3*3)/64	626264	Same	1792
Max pooling 1	(3*3)/64	202064	Valid	0
Conv 2	(3*3)/128	1818128	Same	73,856
Max pooling 2	(3*3)/128	66128	Valid	0
Flatten	—	4608	Valid	0
Fully connection	—	128	—	589,952
665,600

In a comparison from Tables 4 and 5, it can be seen clearly that the total number of parameters in CNN is nearly 25 times to DSCN model as a sub-net. Therefore, the consumption of calculation can be reduced in some degree.

To make a more specific analysis, SN-CNN and SN-DSCN were used for different samples to realize the comparison in calculation time and accuracy, and the result is recorded in Table 6. With the increment of samples, the training time in SN-DSCN will decrease, and the time for a diagnosis for one sample is around 100 ms, which can meet the requirements of real-time fault diagnosis basically. There is an obvious acceleration for training time as 48s in Dataset C and 67s in Dataset D.

Table 6.

Comparison for time consumption.

Sample	Training time		Diagnosis time for one sample
Sample	SN-CNN	SN-DSCN	SN-CNN	SN-DSCN
Dataset A	372s	359s	116 ms	124 ms
Dataset B	837s	820s	114 ms	119 ms
Dataset C	1498s	1450s	107 ms	123 ms
Dataset D	2346s	2279s	187 ms	127 ms

The experiment takes the time–frequency diagram of wavelet transform as the input feature, based on the Siamese network with DSCN, to realize fault diagnosis. The accuracy rate in verifying of rolling bearings in offset R2R unit in proposed model is 99.32%, and the test accuracy rate of the lightweight fault diagnosis model based on the Siamese network is 96.67%, which indicates SN-DSPN has a similar accuracy compare to SN-CNN while reducing the number of parameters.

From the analysis of parameters, there should be a much clearer acceleration for DSCN, however, the structure of SN will also cost lots of calculation resources. As the growing of parameters in many more complex problems, the advantage in SN-DSCN will be much more important and useful. Above all, there is still an improvement in SN-DSCN.

Conclusion

To realize a fault diagnosis of rolling bearings in a real R2R printing unit, a method based on Siamese Network is proposed to solve the small-sample learning problems. The fault signals are changed into time–frequency spectra by CWT and then the problem of fault diagnosis can be seen as a pattern recognition of images. However, this mission is very difficult when there is no enough sample. Therefore, Siamese Networks based on both Convolutional Neural Network and Depth-wise Separable Convolution Network are constructed to fulfill this question. Then a series of fault diagnosis for rolling bearings were taken, fault samples from Bearing Data Center and a R2R unit in printing press were identified successfully. The proposed method, SN-CNN and SN-DSCN were both proved to be effective in the small-sample learning with only very few samples.

From two groups of experiment, faults in rolling bearings were identified with small-sample based on SN with an accuracy over 96.67%, only about 20 samples for one status to training, and this is hard to finish in other deep learning models under same condition. Fault samples from a database with various degree, rotary speed and added noise were all diagnosed successfully. Then the influence for result from optimizer algorithms and samples size were all compared and analyzed, and the fault diagnosis for bearings in a R2R printing unit was realized with only 16 samples for training.

From this work, it can be seen that the SN model has a strong ability in fault diagnosis and can be used in a R2R printing unit for various conditions. SN-CNN and SN-DSCN both can finish this mission with 20 samples, which can be seen as a limited sample learning mission.

Footnotes

Acknowledgments

We also thank Dr Heping Hou for providing experimental devices and suggestions for our work.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Key Research and Development Program of China (No.2019YFB1707203), the project of technology innovation and guidance in Shaanxi Province-the regional innovation guidance plan (No.2020QFY03-06), the China Postdoctoral Science Foundation (No.2019M663783), Natural Science Basic Research Plan in Shaanxi Province of China (Grant No. 2021JM-334) and Scientific Research Program of Shaanxi Provincial Department of Education (No.16JS081).

ORCID iD

Zhuofei Xu

References

Shao

Tan

, et al. Coordinated approach fusing time-shift multiscale dispersion entropy and vibrational Harris hawks optimization-based SVM for fault diagnosis of rolling bearing[J]. Measurement 2021; 173: 108580.

Yao

Wang

, et al. Multiscale local features learning based on BP neural network for rolling bearing intelligent fault diagnosis[J]. Measurement 2020; 153(1): 107419.

Liu

Zhang

Wang

, et al. Sensor Fault Detection and Diagnosis Method for AHU Using 1-D CNN and Clustering Analysis[J]. Comput Intelligence Neurosci 2019; 2019: 5367217.

Yang

Cheng

. A fault diagnosis approach for roller bearing based on IMF envelope spectrum and SVM[J]. Measurement 2007; 40(9): 943–950.

Zhang

. Wavelet Transformin[J]. Fundamentals of Image Data Mining 2019; 2019: 35–44.

Han

FQH

. Detection and classification of marine mammal sounds using AlexNet with transfer learning[J]. Ecol Inform 2021; 62: 101277.

Nussbaumer

. The Fast Fourier Transform[J]. Fast Fourier Transform and Convolution Algorithms 1981; 1981: 80–111.

Rychetsky

Ortmann

Glesner

. Support vector approaches for engine knock detection[J]. . Int Jt Conf Neural Networks, 1999; 7: 969–974. IJCNN'99.

Ali

Fnaiech

Saidi

, et al. Application of empirical mode decomposition and artificial neural network for automatic bearing fault diagnosis based on vibration signals[J]. Appl Acoust 2015; 89: 16–27.

10.

Zhang

Yang

, et al. Naive Bayes Bearing Fault Diagnosis Based on Enhanced Independence of Data[J]. Sensors 2018; 18(2): 463.

11.

Eren

Ince

Kiranyaz

. A Generic Intelligent Bearing Fault Diagnosis System Using Compact Adaptive 1D CNN Classifier[J]. J Signal Process Syst 2019; 91: 179–189.

12.

Zhou

Yang

Fujita

, et al. Deep learning fault diagnosis method based on global optimization GAN for unbalanced data[J]. Knowledge-Based Syst, 2020; 187: 104837.

13.

Zhao

Sun

Jin

. Sequential Fault Diagnosis Based on LSTM Neural Network[J]. IEEE Access 2018; 6: 12929–12939.

14.

Zhang

Yuan

, et al. Analog Circuit Incipient Fault Diagnosis Method Using DBN Based Features Extraction[J]. IEEE Access 2018; 6: 23053–23064.

15.

Wen

Gao

. A transfer convolutional neural network for fault diagnosis based on ResNet-50[J]. Neural Comput Appl 2020; 32: 6111–6124.

16.

Wen

, et al. A New Transfer Learning Based on VGG-19 Network for Fault Diagnosis[J]. In: 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design, 6–8 May 2019, CSCWD), 2019, pp. 205–209.

17.

Liu

Yao

Yang

, et al. Lightweight Convolutional Neural Network and Its Application in Rolling Bearing Fault Diagnosis under Variable Working Conditions[J]. Sensors 2019; 19(22): 4820.

18.

. An End-to-End Intelligent Fault Diagnosis Application for Rolling Bearing Based on MobileNet[J]. IEEE Access 2021; 9: 41925–41933.

19.

Han

Zhang

, et al. Parallel sparse filtering for intelligent fault diagnosis using acoustic signal processing[J]. Neurocomputing 2021; 462: 466–477.

20.

Wang

Han

, et al. Intelligent fault diagnosis for rotating machinery using L 1/2 -SF under variable rotational speed[J]. Proc Inst Mech Eng D J Automobile Eng 2021; 235(5): 1409–1422.

21.

Gabriel

Enrique

Viviane

, et al. Deep variational auto-encoders: A promising tool for dimensionality reduction and ball bearing elements fault diagnosis[J]. Struct Health Monitoring-An Int J 2019; 8(4): 1092–1128.

22.

Wang

Zheng

, et al. A Novel Method for Intelligent Fault Diagnosis of Bearing Based on Capsule Neural Network[J]. Complexity 2019; 2019: 6943234.

23.

Wang

Chen

Liu

, et al. Wavelet-Prototypical Network Based on Fusion of Time and Frequency Domain for Fault Diagnosis[J]. Sensors 2021; 21(4): PMC7924639.

24.

Hou

, et al. Research on Fault Diagnosis of Rolling Bearing in Printing Press Based on Convolutional Neural Network[C]. In: the 11th China Academic Conference on Printing and Packaging, Guangzhou, China, 2020, CACPP.

25.

Rioul

Duhamel

. Fast algorithms for discrete and continuous wavelet transforms[J]. IEEE Trans Inf Theor 1992; 38(2): 569–586.

26.

Zhao

Song

, et al. Detection of chlorophyll fluorescence parameters of potato leaves based on continuous wavelet transform and spectral analysis[J]. Spectrochimica Acta A: Mol Biomol Spectrosc 2021: 259: 119768.

27.

Afifi

Fassi-Fihri

Marjane

, et al. Paul wavelet-based algorithm for optical phase distribution evaluation[J]. Opt Commun 2002; 211(1): 47–51.

28.

Moortel

Munday

Hood

. Wavelet Analysis: the effect of varying basic wavelet parameters[J]. Solar Phys 2004; 222: 203–228.

29.

Zhang

J R

, et al. Onset detection of ultrasonic signals for the testing of concrete foundation piles by coupled continuous wavelet transform and machine learning algorithms[J]. Adv Eng Inform 2020; 43: 101034.

30.

Chollet

. Xception: Deep Learning with Depthwise Separable Convolutions[J]. IEEE Conf Comp Vis Pattern Recognition (Cvpr) 2017: 1251–1258.

31.

Lee

Kang

Madireddy

, et al. Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time[J]. In: IEEE International Conference on Big Data, Los Angeles, CA, 2019, pp. 830–839.

32.

Peng

Xiao

, et al. MegDet: A Large Mini-Batch Object Detector[J]. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition IEEE, Nashville, TN, USA, 2018, pp. 6181–6189.

33.

FQH

Han

, et al. A Generic Intelligent Bearing Fault Diagnosis System Using Convolutional Neural Networks with Transfer Learning[J]. IEEE Access 2020; 8: 164807–164814.

34.

Han

, et al. A Quantitative Study of Aggregation Behavior and Integrity of Spray-Dried Microcapsules Using Three Deep Convolutional Neural Networks with Transfer Learning[J]. J Food Eng 2021; 300: 110515.

35.

Shi

Zhao

, et al. Extreme Interval Entropy Based on Symbolic Analysis and a Self-Adaptive Method[J]. Entropy 2019; 21(3): 238.