Abstract
The existing transfer diagnosis methods based on entropy minimization are easy to lead to trivial solution. To solve this problem, a deep diversity maximization-based adversarial transfer diagnosis approach for rotating machinery is presented in this paper. Firstly, the deep convolution neural network is utilized as the feature encoder to learn the characteristics of vibration signals in different working conditions. The diversity maximization strategy is taken to balance the entropy minimization, so as to avoid trivial local minimum. The categories predicted by nontrivial domain adaptation method are more diverse. Moreover, the entropy is conducted to evaluate the uncertainty of the predicted result of the classifier. Using this deterministic strategy based on entropy to adjust the domain discriminator. The experimental study demonstrates the effectiveness of the developed method.
Introduction
Rotating machinery is widely used in important engineering fields such as aerospace, automobile manufacturing, rail transit, and wind power generation. The working conditions of rotating machinery are complicated and changeable. Mechanical equipment often breaks down during a long period of operation. Unexpected damage may lead to long downtime and high maintenance costs, and even pose a huge threat to security. It is crucial to carry out condition monitoring and diagnosis to ensure the reliable, continuous, and stable operation of machinery.1,2
Deep learning has obtained huge success in fault diagnosis, due to its strong feature learning ability. Deep learning provides an end-to-end solution for mechanical equipment fault diagnosis, and realizes joint optimization feature extraction and fault recognition without tedious feature extraction steps. Deep learning has built a bridge between mechanical big data and intelligent operation and maintenance, and promoted the development of intelligent health monitoring of mechanical equipment. Deep learning has been successfully applied in intelligent fault diagnosis, such as deep belief network, 3 stacked auto-encoder, 4 convolutional neural network (CNN) 5 and recurrent neural network. 6 However, its good classification performance is usually limited by the following two basic assumptions: (1) the test data and training data need to meet the independent and identical distribution; (2) the task to be diagnosed has sufficient label fault samples.7,8 As the data discrepancy between training samples and test samples increases, the generalization performance of the training model will be significantly reduced.
Transfer learning relaxes the restriction that test data and training data must meet independent and identical distribution in the traditional machine learning. In transfer learning, the source domain and target domain do not need to follow the same distribution. It can mine the domain invariant features between different but related domains, so that the labeled data and other supervised information can be transferred between the two domains.9,10 In the area of fault diagnosis, some research on transfer diagnosis methods by using traditional methods have been carried out such as transfer component analysis, 11 singular value decomposition and TrAdaBoost. 12 Kang et al. 13 introduced a transfer fault diagnosis method by multifeature construction and variable mode decomposition.
In addition, the deep transfer learning method by combining the deep learning and transfer learning technology has also been explored. Yang et al. 14 developed a transfer diagnosis model called polynomial kernel maximum mean discrepancy (PK-MMD). Shao et al. 15 described transfer diagnosis method based on a fine-tuning transfer network. Jiao et al. 16 designed two distinguished one-dimensional convolution networks as the basic structure to learn discrimination and domain invariant representation. Li et al. 17 put forward using the multi-layer equalization domain adaptation method to train the model. Wang et al. 18 illustrated a transfer learning approach using pre-training CNN to extract features from different data sets. An online incremental support vector machine was conducted to classify various cases. Hasan et al. 19 presented a reliable transfer fault diagnosis approach by using acoustic spectral imaging. Zhou et al. 20 gave a multi-level deep convolution transfer learning scheme to transfer the fault diagnosis ability to other instruments. Wang et al. 21 introduced a deep multi-scale intra-class transfer diagnosis approach, aiming to reduce the distribution difference of different working conditions. Liu et al. 22 designed a joint distribution optimal domain adaptation approach for transfer fault diagnosis. An et al. 23 exploited self-learning transferable networks for mechanical fault diagnosis with unlabeled and unbalanced data. Similar research based on deep transfer diagnosis can be found in Wu et al 24 and Han et al. 25 Qian et al. 26 improved DenseNet and joint distribution adaptation for the transfer diagnosis. The core idea of the methods above is automatically learning the feature information of the two working conditions by using the deep learning model, and finally achieve the knowledge transfer by shortening the gap between the two working conditions. The core idea of the method above is automatically learning the feature information of the two working condition by using the deep learning model, and finally achieve the knowledge transfer by shortening the gap between the two working conditions.
Generative adversarial networks (GANs) is widely used in the unsupervised learning and semi supervised learning to improve the generalization ability.27,28 The idea of unsupervised GAN is conducive to the target model learning of unlabeled domain invariant representation. Li et al. 29 described an adversarial domain adaptation approach using knowledge mapping. Jiao et al. 30 put forward a double-level adversarial model for cross-domain diagnosis. Chai and Zhao 31 presented a fine-grained network to achieve industrial fault diagnosis. Guo et al. 32 improved an intelligent scheme named deep convolutional transfer learning network to realize cross-domain diagnosis. Li et al. 33 introduced a two-stage transfer diagnosis method for multi-fault detection, which can effectively separate new multiple unlabeled fault types. Si et al. 34 gave an unsupervised deep network based on moment matching is proposed. The grayscale time-frequency image was used as the network input, and two adaptation methods were conducted to reduce the distribution difference. Jia et al. 35 designed a joint distribution adaptation-based transfer network to enhance feature extraction ability. Li et al. 36 described a class weighted adaptive neural network to encode positive transfer of the shared classes and ignore the source outliers. Shao et al. 37 gave an adversarial domain adaptation method by combining the MMD and domain confusion function. Zhang et al. 36 presented a small sample intelligent fault diagnosis method by using the multi-modal gradient penalty generation adversarial network. Li et al. 38 illustrated a new partial transfer fault diagnosis approach using weighted adversarial transfer network. Weighted learning strategy was used to weight their contribution to domain discriminator and source classifier (Figure 1).

The network of DMATD.
The studies above successfully recognized transfer diagnosis without labels in the target domain. They have the following deficiencies:
They do not consider the problem of trivial solution in the process of transfer adaptation, and the diversity of prediction categories is not enough.
They do not consider the uncertainty of the predicted result of the classifier, which degrades the performance of the discriminator.
To tackle this challenge, deep diversity maximization-based adversarial transfer diagnosis (DMATD) approach for rotating machinery is proposed. Three key improvements have been made as follows:
The diversity maximization strategy is taken to balance the entropy minimization, so as to avoid trivial local minimum. The categories predicted by nontrivial domain adaptation method are more diverse.
The entropy of prediction category vector is conducted to evaluate the uncertainty of the predicted result of the classifier, so as to adjust the domain discriminator. The entropy strategy and designed network structure can be expanded in other application scenarios.
The input is the original vibration signal, which realized end-to-end fault diagnosis. The experiments of rolling bearing under variable working conditions were designed.
The rest of this article is arranged as follows: Section “Preliminaries” gives the problem definition and preliminaries. Section “Proposed fault diagnosis approach” presents the DMATD diagnosis network in detail. In Section “Case study,” the case study is analyzed to verify the DMATD. Finally, main conclusions and future work are given in Section “Conclusion.”
Preliminaries
Problem definition
Suppose there is a monitoring dataset
Feature encoder
CNN has become an excellent feature encoder and has obtained outstanding performance in many fields.40,41 The vibration signal is a one-dimensional time series, therefore, one-dimensional deep convolution neural network (1D-DCNN) is adopted as the feature encoder structure to obtain the feature information of the source working condition and the target working condition in this paper.one-dimensional deep convolution neural network (1D-DCNN)
Proposed fault diagnosis approach
The samples x are a batch data of source samples or target samples, and the vector f is the feature after the feature learning process. F maps the x to
For supervised learning in the source domain, the loss function of the classifier is
where,
where,
And
Compared with using more complex cross-domain discrepancy, the end-to-end training of network parameter
The entropy of
The discriminator D is constructed to distinguish the characteristics of source working condition and target working condition, while generator G is trained in the min-max adversarial mode. Learning domain invariant feature means finding the optimal parameter
where,
The general procedure of the DMATD is displayed in Figure 2, and the complete procedures are as follows:
Collect the labeled monitoring data
The DMATD model is designed and randomly initialized. Then, the constructed training datasets are input to the DMATD. Calculate the classification loss of the source working condition and calculate the entropy minimization loss and the diversity maximization loss to avoid trivial local minimum. Furthermore, calculate the discrimination loss to make the two feature distributions of domains as similar as possible. Finally, the DMATD is optimized by using the total loss.
The monitoring data

The general procedure of the DMATD.
Case study
In this part, the experiments of rolling bearings under variable working conditions were carried out and six transfer tasks were created. The case study is analyzed to verify the DMATD.
Dataset description
The case data came from Acceleration bearing life test (ABLT-1A) bearing test bench, as displayed in Figure 3(a). Figure 3(b) shows the data acquisition system. The tested bearing was 6204 single row deep groove ball bearing. In order to effectively detect the running state of each bearing, the vibration of the four bearings were measured by four single-axis acceleration sensors. Four thermocouple sensors and an acceleration sensor were utilized to detect the temperature of the bearing outer ring and the vibration of the whole test-bed, respectively. The four installed bearings were numbered as bearing 1, bearing 2, bearing 3, and bearing 4 from left to right. The bearing 1 was set four states: normal (N), inner and outer race compound fault (IOF), outer race fault (OF), and inner race fault (IF). Furthermore, the other three bearings were normal.

The experiments of rolling bearings: (a) The test bench, (b) data acquisition system, (c) bearing installation, and (d) four states of bearing 1.
The fault size of rolling bearings was 1.8 mm deep and 1.2 mm wide. The installation of four bearings and the tested bearings are shown in Figure 3(c) and (d), respectively. Adding radial load was realized by adding weights to the radial load. The radial load transmitted the pressure to the tested head in a ratio of 100:1. The designed working conditions of the bearing designed in this experiment were A, 1800 rpm, 2.5 kN; B, 2100 rpm, 2.5 kN; and C, 2100 rpm, 5 kN. The working condition A meant the speed of the bearing was 1800 rpm and the designed working load was 2.5 kN. The data acquisition card in the experiments was National Instruments 9234. The sampling frequency of the whole experiment was 12.8 kHz. The description of the bearing dataset is depicted in Table 1.
Description of the bearing dataset.
N: normal; IOF: inner and outer race compound fault; OF: outer race fault; IF: inner race fault.
Experimental results
The acquisition of data set and the construction of network
In order to verify the DMATD, six transfer tasks were designed. Transfer task A → C indicates that 100% of the labeled source data from the condition A and 50% of the unlabeled target data from the condition C. 50% of the data in target data C was used for testing. The target domain did not have health labels. Figure 4 shows the vibration signals of the working condition A, B, and C.

The vibration signals in the time domain: (a) Designed work state A, (b) designed work state B, and (c) designed work state C.
The feature encoder network in this paper was constructed by 1D-DCNN. The structure of the feature encoder is depicted in Table 2. The original vibration data of rolling bearings were input into 1D-DCNN, and each sample had 1024 data points. The learning rate was set as
The structure of feature encoder.

The analysis of tradeoff parameters.
The training of transfer diagnosis network
For the transfer task C → A, the changes of the total training loss, the classification loss and the diversity maximization loss are shown in Figure 6. The total training loss and the classification loss converge well. The diversity loss can dynamically fine tune the total training loss. The transfer diagnosis knowledge had been built after training. To visualize the characteristics learned from source data and target data, the t-SNE 43 is utilized for the dimension reduction of the learned characteristics. Figure 7 shows the visualization of C → A. It is evident that the features of source working condition and target working condition with same category can be well clustered.

Three losses in the training process.

The t-SNE visualization of the two domains.
Health status identification of working conditions in the target domain
The data set
Comparison with other methods
The other four methods are compared to verify the effectiveness of the DMATD. In method 1, the source domain is trained directly without domain adaptation, which is to diagnose the target working condition; In method 2, the difference between the two domains is reduced by MMD 44 ; In method 3, domain adaptation is performed by correlation alignment (CORAL) loss 45 ; In method 4, the conditional domain adversarial networks (CDAN) 46 is introduced. In method 5, the entropy minimization without diversity maximization is adopted to achieve the domain adaptation. Table 3 displays the comparison among the DMATD and the latest methods. The accuracy of the DMATD is 30.5%, 19.9%, 24.3%, 10.0%, 7.5% higher than that of CNN, MMD, CORAL, CDAN, entropy minimization only (EMO), respectively.
Comparison with five other methods.
CNN: convolutional neural network; DMATD: deep diversity maximization-based adversarial transfer diagnosis, MMD: maximum mean discrepancy.
The t-SNE is utilized to visualize the features of the health state of the target working condition, as displayed in Figure 8. It is evident that the scatter diagram is chaotic and the phenomenon of misclassification is obvious. In Figure 8(a), the scattered points of the same classed are well clustered together. The DMATD not only shorten the gap between the two domains, but also expands the distance between different categories, indicating that the DMATD can identify the target working condition well. In Figure 8(b), there are still some wrong identification points. The method after domain adaptation has raised the identification accuracy of the target working condition. The method without domain adaptation is poor in identifying the target working condition. The confusion matrix is depicted in Figure 9. Due to the limited length of the paper, the visualization of the MMD, CORAL and CDAN methods are not presented here. The accuracy of the MMD, CORAL and CDAN methods is between the DMATD and CNN.

The t-SNE dimension reduction of the different methods: (a) DMATD, (b) EMO, and (c) CNN.

The corresponding confusion matrix: (a) DMATD, (b) EMO, and (c) CNN.
The performance analysis of the DMATD
The training time of different methods
Figure 10 displays the training time of different methods. The time consumption of the DMATD stands in the middle of the comparison method. The time consumption can be acceptable.

The training time of different methods.
The comparison of the sensitivity and specificity
The DMATD is further assessed by the sensitivity and specificity. True positive (TP) denotes the number of correctly identified positive data. True negative (TN) means the number of correctly identified data as negative. False positive (FP) indicates the amount of misidentified data as positive. False negative (FN) represents the amount of misidentified data as positive. Therefore, the sensitivity
where the

The analysis of sensitivity and specificity.
Conclusion
A DMATD approach for rotating machinery is presented in this paper, to solve the problem that the existing transfer diagnosis methods based on entropy minimization are easy to lead to trivial solution. The accuracy of the DMATD method is 30.5%, 19.9%, 24.3%, 10.0%, 7.5% higher than that of CNN, MMD, CORAL, CDAN, and EMO, respectively. The diversity maximization strategy can balance the entropy minimization effectively, so as to avoid trivial local minimum. The categories predicted by nontrivial domain adaptation method are more diverse. Moreover, the entropy is able to evaluate the uncertainty of the predicted results. Using this deterministic strategy based on entropy can adjust the domain discriminator. Finally, the DMATD can achieve the fault diagnosis of rotating machinery under variable working conditions.
The author’s research in the future will mainly focus on domain generalization.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the National Natural Science Foundation of China (Grant No. 52105517).
