Abstract
Rotating machinery fault diagnosis is very important for industrial production. Many intelligent fault diagnosis technologies are successfully applied and achieved good results. Due to the fact that machine damages usually happen under different working conditions, and manual scale labeled data are too expensive, domain adaptation has been developed for fault diagnosis. However, the current methods mostly focus on global domain adaptation, the application of subdomain adaptation for fault diagnosis is still limited. A deep transfer learning method is proposed for rotating machinery fault diagnosis in this study, where subdomain adaptation and adversarial learning are introduced to align local feature distribution and global feature distribution separately. Experiments are performed on two rotating machinery datasets to verify the effectiveness of this method. The results reveal that this method has outstanding mutual migration ability and can improve the diagnostic performance.
Introduction
Rotating machinery is currently used in various industries, and timely and effective diagnosis of rotating machinery faults is also very important. 1 With the maturity of the concept and technology of big data and intelligent manufacturing, intelligent fault diagnosis methods have been applied.2,3 Deep learning is also widely used to solve fault diagnosis due to its powerful feature learning ability.4–6 The success of these methods depends on the large amount of labeled data available for supervised learning, and they require training and testing data come from the same distribution. However, manual large scale labeled data are too expensive and sometimes cannot be collected in practice. Also, due to the fact that the working conditions of the machine usually change in different tasks, there is a general distribution difference between training and testing data in the real industry. The predictive models trained using these deep representations on one dataset cannot be well generalized to novel tasks.
Transfer learning can mine domain invariant essential features and structures between two different but interrelated domains, which enables supervised information such as labeled data to be migrated and reused between domains. 7 Recently, transfer learning methods have been increasingly used in rotating machinery fault diagnosis. 8 Shao et al. 9 used a pre-trained network and fine-tune strategy to achieve fast and accurate fault state classification. He et al. 10 proposed a deep transfer auto-encoder for fault diagnosis of small samples under different working conditions. However, these methods still require the labels of the target domain.
Domain adaptation is a representative method in transfer learning that learns domain-invariant features without using target labels to bridge source domain and target domain. 11 Because deep networks could learn more transferable features, embedding domain adaptation method into deep learning can better match the distributions across domains.12,13 Minimizing domain distribution discrepancy is the most popular approach in cross-domain fault diagnostics and the maximum mean discrepancy (MMD) is commonly used as a metric.14,15 Li et al. 16 proposed a domain adaptation architecture and minimized multi-kernel MMD between two domains to realize cross-domain fault diagnosis. Wen et al. 17 also used MMD with a three-layer sparse auto-encoder for bearing fault diagnosis. Zhang et al. 18 used a domain adaptive convolutional neural networks, which combined MMD and fine-tune strategy for fault diagnosis under different working conditions.
Recently, adversarial domain adaptation integrates adversarial learning and domain adaptation, has been successfully embedded in deep networks to minimize the discrepancy distance of the domain discriminator by an adversarial objective. 19 Guo et al. 20 used adversarial adaptation and minimization of MMD to design a deep convolutional domain adaptation network for bearing fault diagnosis. Besides, Li et al. 21 utilized adversarial learning on rotating machinery fault diagnosis. They also combined adversarial training and parallel data strategy to distinguish machine health conditions. 22
However, these domain adaptation methods only align global distributions between source and target domains, regardless of crucial information for each category. As a result, we cannot guarantee that samples from different domains but from the same category will be mapped near the feature space, since there is no labeled information for target domain. Subdomain adaptation can accurately align the class conditional distributions. As shown in Figure 1, 23 when the categories of mechanical faults are close, such as different fault diameters of the same fault type, after global domain adaptation the overall distributions of source domain and target domain are close, but the features of different fault categories are too close to be accurately classified. After subdomain adaptation, different mechanical fault categories are divided into subdomains and the same mechanical fault can be aligned accurately. Subdomain adaptation can explore the dependency between the features and categories to capture the underlying multi-mode structures of data distributions. 24 Xu et al. 25 proposed a conditional domain adaptation method based on domain adversarial neural network for cross-domain fault diagnosis. Yu et al. 26 proposed a simulation data–driven domain adaptation approach which align the marginal distribution and conditional distribution between simulation data and realistic monitoring data.

The comparison of global domain adaptation and subdomain adaptation.
In this study, a deep transfer learning method based on subdomain adaptation and adversarial learning is proposed for rotating machinery fault diagnosis. We use domain confusion to align global feature distribution and use local maximum mean discrepancy to align local feature distribution. Also, we pre-train the feature extractor on ImageNet dataset and utilize fine-tune strategy to speed up training process and improve accuracy. The experimental results reveal that this method can greatly improve the ability to extract the transferable features between the two domains, and can improve the diagnostic performance.
In the remainder, Section 2 introduces the problem formulation and domain adaptation. Section 3 presents the proposed method in detail, and the verification and research of the proposed algorithm are carried out in Section 4. In the end, Section 5 provides a conclusion of this paper.
Preliminary work
Problem formulation
In this study, the relationship between relevant fault subcategories is considered in the rotating machinery fault transfer diagnostic problem. In order to explain the concerned problem, some symbols, and definitions are first given in this section.
Let
Domain adaptation
As previously mentioned, recent studies reveal that deep transfer learning can reduce the shifts in the distributions of monitoring signals from two different domains and learn transferable representations of machinery fault feature simultaneously. But these methods mainly focus on aligning the overall distributions of two domains, which ignore the relationship between subdomains in different domains of the same class. Actually, when machinery fault monitoring signals are collected from different operation conditions or sensor installation location, not only overall distributions of two domains is different, but also the relationship between subdomains in different domains of the same class is also variant. For this purpose, this study proposes a novel deep adversarial transfer learning method. This method uses adversarial learning and subdomain adaptation to realize rotating machinery transfer fault diagnostic, which aligning both global and local feature distribution of source and target domain simultaneously. This study considers the relationships between subdomains in different domains of the same class in machinery transfer fault diagnostic field.
Proposed method
Network architecture
In rotating machinery fault diagnosis field, the purpose of this study is to design a deep transfer learning network to not only reduce the shifts in overall distributions of source and target domains, but also reduce the shifts in local distributions within the same category of two domains, and learn transferable feature representations between different domains simultaneously. The architecture of the proposed deep adversarial transfer learning method based on subdomain adaptation is shown as Figure 2. The method consists of two feature extractors

The architecture of the proposed deep adversarial transfer learning method.
Feature extractor
In this study, machinery fault monitoring signals are collected by accelerometer from different operation conditions or installation locations. Considering vibration signal is a non-stationary time-varying signal in the machine running process, this study firstly applies short-time Fourier transform (STFT) to obtain time-frequency spectrum of raw signals, which is then fed into the feature extractors as inputs.27,28
Due to the different input spaces in fault diagnosis, two feature extractors
Classifier
The classifier
Domain discriminator
The domain discriminator
Optimization objective
In this study, the proposed deep adversarial transfer learning method comprises four optimization objects. In order to make the classifier identify the health status of the source domain correctly, supervised learning with the source labeled data is critical which is objective1. To align the global feature distribution, we use adversarial learning, that is objective 2 and objective 3. Objective 2 aims to train the domain discriminator to recognize whether the features are from source domain or target domain. Objective 3 aims to confuse the domain discriminator. To align the local feature distribution, we use LMMD as objective 4. Then, we introduce these four optimization objectives in detail.
Objective 1
In order to recognize rotating machinery health conditions, the proposed network should have the ability to learn discriminative features from the source domain supervised samples. Therefore, the source supervision is utilized to minimize the classification error. Concretely, the cross-entropy loss is regarded as the source supervision loss, which is defined as follow,
where
Objective 2
The domain discriminator is firstly applied to recognize whether the high-level features are from source domain or target domain. Therefore, domain recognition is introduced to minimize the domain recognition error between the two domains. Similarly, the cross-entropy loss is regarded as the domain recognition loss, which is defined as follow,
where
Objective 3
For learning domain-invariant features, the adversarial learning is employed to train domain discriminator. In general, the gradient reversal layer is applied to maximize the domain recognition loss to extract domain-invariant features like generative adversarial network. 30 However, this will make the discriminator converge too fast, and make the gradient disappear. In this study, we employ domain confusion loss as the adversarial loss to learn actual mapping. The cross-entropy loss function is used to train the map with uniform distribution and it predicts the input binary domain label to encourage them to predict as close as possible to a uniform distribution on the binary labels. The loss seeks to learn domain invariance to confuse the two domains. Hence, the third optimization objective is calculated as follow,
Generally, we need to simultaneously minimize equations (2) and (3) for the representation and the domain classifier parameters. Nevertheless, these two losses are directly opposed. Learning fully domain invariant feature extractors represent the domain discriminator does a poor job. While learning a high-performance domain discriminator represents the features learned by feature extractors are not domain invariant. We are not globally optimizing the parameters, but given the fixed parameters of the previous iteration, iterative update of the two objectives is performed. In this way, the loss can ensure that the adversarial discriminator views the two domains equally.
Objective 4
Maximum mean discrepancy (MMD)
15
is a non-parametric distance estimation, and it is widely used in the field of rotating machinery fault diagnosis to measure the difference between target and source distribution. However, the previous MMD-based deep fault diagnosis transfer learning algorithms only focused on the global distributions, neglecting the relationship between subdomains in different domains of the same class in machinery transfer fault diagnostic field. Considering the relationship between related subdomains, it is of great significance to align the distributions of the related subdomains in the two domains of the same class. Therefore, the local maximum mean discrepancy (LMMD) is introduced to align distributions of the related subdomains, it assumes each sample belongs to weight
where
where
When these optimization objects are built, the network is trained by stochastic gradient descent (SGD) algorithm. Due to the fact that the parameters of the feature extractors and classifiers in the two domains are shared, three modules are used: the feature extractor, domain discriminator, and classifier, whose parameters are denoted as
Equation (8) only updates
Based on the equations (8) and (9), the parameters can be updated in each training epoch as follows:
where
In this way, the model can learn the domain invariant features from the two domains in the training process. Then the trained model can predict the unlabeled target samples according to these features.
Experimental study
Dataset descriptions
CWRU dataset
The CWRU dataset is came from the Case Western Reserve University Bearing Data Center. 31 It uses acceleration sensor to monitor the bearing with single-point faults damaged by electrical discharge machines (EDMs) and is widely used in fault diagnosis research. The experimental device is shown in Figure 3. The dataset includes four kinds of health conditions: Normal operation (Nor), inner race fault (IF), outer race fault (OF), and ball fault (BF). Different kinds of faults are also manually created with different fault diameters of 7, 14, and 21 mils. Hence, ten health states are diagnosed. The dataset includes four motor load conditions (0 HP, 1 HP, 2 HP, and 3 HP), and are acquired from the drive and fan end, respectively. To verify the advancement of our proposed method, the migration was carried out under both different load scenarios and different sensor positions. STFT is used for time-frequency imaging representation of vibration signal. About 200 images are taken for each category of health statuses in the two domains, totaling 2000 images, respectively.

Experiment device in the CWRU dataset.
PHM 2009 challenge dataset
The PHM2009 dataset is published by the PHM Society. 32 It uses acceleration sensor to monitor the mixed faults of gear, bearing, and rotating shaft. At the same time, there are many kinds of fault states in each component, which makes the fault diagnosis of equipment a certain challenge. The dataset contains two sets of gears: spur and helical gears. The experiment device and schematic of the gearbox are shown in Figure 4.

Experiment device and schematic of the gearbox.
In this study, the spur gearbox dataset is taken to test the accuracy of our proposed method. It has eight different health conditions under low and high load and 30, 35, 40 Hz speed. The Signal under each health condition is collected at a sampling frequency of 66.67 kHz and the acquisition time is 4 s. The label information of the dataset is shown in Table 1. To verify the advancement of our method, the migration was carried out under different load scenarios and different speeds. STFT is used for time-frequency imaging representation of vibration signal. About 200 images are taken for each category of health statuses in the two domains, totaling 1600 images, respectively.
Descriptions of the PHM 2009 challenge dataset.
IS: Input Shaft; IS: Input Side; ID: Idler Shaft; OS: Output Side; OS: Output Shaft; Nor: Normal.
Compared methods and training details
The proposed method is compared with other deep transfer learning methods: Deep Domain Confusion (DDC), 33 Deep Adaptation Network (DAN), 34 Domain-Adversarial Neural Networks (DANN), 35 Deep CORAL (D-CORAL), 36 and our previous work DADA-TL. 37 DDC embeds MMD into an adaptation layer to learn domain invariant features. DAN uses multi-kernel MMD to align different distributions optimally to learn transferable features. DANN makes the domain discriminator unable to distinguish the source and target domain through adversarial training, thereby improving domain adaptability. Deep Coral uses CORAL loss to match source and target domains. We use ResNet-50 as the feature extractor for the above methods. We follow standard evaluation protocols for unsupervised domain adaptation, comparing the average accuracy of each method in three random experiments. For all MMD-based methods and our proposed method, we adopt Gaussian kernel with bandwidth set to median pairwise squared distances on the training data.
We use PyTorch framework to implement all transfer learning methods, and fine-tune ResNet models provided by PyTorch. The models have been pre-trained on the ImageNet 2012 dataset. The layers of feature extractor are fine-tuned and the layers of domain discriminator and classifier are trained from scratch via back propagation. Therefore, the learning rate of domain discriminator and classifier are set ten times of feature extractor. We use the learning rate annealing strategy in DANN,
35
it is adjusted during SGD with 0.9 momentum and describing by the following expression:
Results and analysis
CWRU dataset
There are a total of twelve transfer tasks under different load scenarios, where

The classification accuracy of CWRU dataset under different loads.
Furthermore, the confusion matrix of the proposed method for task

The confusion matrix of task
There are eight transfer tasks under different sensor locations, where

The classification accuracy of CWRU dataset under different sensor locations.
PHM 2009 challenge dataset
To further verify the advancement of our method, PHM2009 dataset is used. Its multi-class mixed faults make the fault diagnosis transfer task more challenging. Table 2 shows the accuracy of this dataset under different speeds, and Table 3 shows the classification accuracy under different loads. About 30–35 means that the data collected at 30 Hz is used as source domain and 35 Hz as target domain. The 30L-H represents the data with the low load of 30 Hz speed is used as source domain and the data with the high load of 30 Hz speed is used as target domain. It can be seen from the results that in the hybrid fault diagnosis transfer tasks, the performance of the global domain adaptation is not satisfactory, and the performance is significantly reduced in different transfer tasks. Especially for migration under different loads, the best average accuracy of the global domain adaptation method is lower than 70%. The proposed method using subdomain adaptation can greatly increase the accuracy of classification. The average accuracy is 85.9% under different loads and 97.5% under different speeds, which greatly improves the performance. In general, the above results all illustrate that our method can effectively realize fault diagnosis. In addition, we take 40L-H task as an example to compare the performance of domain confusion loss and the gradient reversal layer as shown in Figure 8. The result shows that using domain confusion loss has higher accuracy than gradient reversal layer and is more suitable for our network architecture.
Classification accuracy (%) on the PHM 2009 dataset under different speeds.
Classification accuracy (%) on the PHM 2009 dataset under different loads.

Comparison of using domain confusion loss and gradient reversal layer.
To intuitively analyze the performance of our method, t-SNE 38 technique is applied to visualize the features extracted by the feature extractor from the two domains into a two-dimensional map. We visualized the 40L-H task, and the results are displayed in Figure 9. For global domain adaptation methods, they focus on marginal domain adaptation, when the categories of mechanical faults are close, the fault features in source and target domains are not aligned very well and some features are hard to classify. Although these methods can improve the distribution difference, it is still not satisfactory. For our proposed method, we can find that the fault features of the same category in two domains are aligned very well. Fault features in these two domains with the same category are very close, and fault features with different categories are also scattered. Subdomain adaptation can obtain more fault category information, which can effectively improve the performance of cross-domain fault diagnosis. The results suggest that the proposed method is more effective to reduce the distribution discrepancy of the two domain, and intuitively illustrate the high-performance of our method.

The t-SNE visualization of features: (a) DAN, (b) DADA-TL, and (c) the proposed method.
Conclusions and future work
This paper presents a deep adversarial transfer learning method on rotating machinery fault diagnosis. Unlike the previous method, the paper uses domain confusion and local maximum mean discrepancy to align global distributions and subdomain distributions simultaneously. By comparing with other transfer learning methods on two rotating machinery datasets, our proposed method improves the ability of extracting domain-invariant and transferable features and greatly improve the rate of accuracy. Furthermore, in the transfer tasks under various complex working conditions, the proposed method achieves the best results. It proves this method is an effective way to address the problem of the unlabeled data in practical industrial application. Furthermore, this method can also extend to the fault detection of other mechanical systems too. The future work can be focused on the imbalanced data problem and more transfer scenarios. Besides, different time-frequency analysis methods will affect the accuracy of the model,39,40 we will try to research it in the future.
Footnotes
Handling Editor: James Baldwin
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by National Natural Science Foundation of China (Grant No. 51775323).
