Active domain adaptation method for label expansion problem

Abstract

Over the past few years, cross-domain fault detection methods based on unsupervised domain adaptation (UDA) have gradually matured. However, existing methods usually assume that the source and target domains have the same label domain space, but ignore the problem of label expansion in the target domain. The source domain of such problems lacks transferable knowledge of newly added health categories, so the domain invariant features extracted by the UDA model only have a large correlation with the source domain health categories, but lack the key features to distinguish the newly added health categories. We found that most of the diagnostic results of this type of samples are distributed at the decision boundary of the source domain health category, and this special distribution means that the newly added health category samples have a high amount of information. Therefore, this paper considers using active learning to select samples of newly added health categories in the target domain to assist model training, and proposes an active domain adaptation intelligent fault detection framework LDE-ADA to deal with the label expansion problem. Finally, on the rotating machinery dataset, the analysis and comparison are carried out through six transfer tasks. The results show that when there is one new health category, the accuracy of LDE-ADA will increase by about 9.39% in the case of labeling three samples per round and training for 20 rounds. Experiments show that this method is an effective method to deal with the label expansion problem.

Keywords

Active domain adaptation active learning domain adaptation convolutional neural network fault diagnosis

Introduction

As the scale of mechanical equipment continues to expand and its functions become more complex, in order to avoid unnecessary economic losses, more and more attention has been paid to effectively preventing equipment failures.¹ The method of signal processing² requires the detection personnel to have rich knowledge and experience, and it is difficult to perform real-time monitoring. The method of machine learning (ML)³ requires a lot of human resources for feature processing, and high-dimensional features are difficult to mine. In recent years, deep learning (DL) technology⁴ has been widely studied for equipment fault detection due to its powerful feature extraction capability. However, training a high-performance DL model requires a large number of labeled samples, but the cost of collecting these labeled samples is expensive, which is the essential reason why deep models are rarely successful. Meanwhile, most of the existing deep learning methods assume the same data distribution in the source and target domains. However, in actual operation of mechanical equipment, there are reasons such as changing working conditions (such as changes in the speed and load of the equipment) and sudden changes in temperature. This assumption is unrealistic. Therefore, the performance of well-trained deep models applied to practical work will be greatly compromised.

Transfer learning (TL) can mine domain invariant fundamental features and structures in two different but related domains, which enables the information learned from the source domain to be transferred and reused between domains.⁵ In recent years, transfer learning methods have been increasingly applied to the fault diagnosis of rotating machinery.^6,7 Unsupervised domain adaptation (UDA) is a representative method in transfer learning. This method generally utilizes minimum domain spacing^8,9 or adversarial strategies^10,11 to apply the knowledge learned by the model from the source domain to the detection of the target domain, so as to solve the problem of mapping bias. In the past few years, unsupervised domain adaptation has been gradually applied and developed in the fields of image classification^12,13 and mechanical fault detection.^14–16

However, the detection method based on unsupervised domain adaptation still has some defects, and two problems are more prominent:

(1) Performance issues with unsupervised models. Domain adaptation enables cross-domain diagnosis by solving the problem of mapping bias between source and target domains. However, the diagnostic performance of the unsupervised domain adaptation models is far less than that of most supervised diagnostic models,^17,18 and even a small number of target domain label samples can significantly improve the diagnostic performance of the model.

(2) Label domain expansion problem. Most of the current mainstream domain adaptation diagnosis methods assume that the source domain and the target domain have the same label domain space. However, when the target domain has more health categories than the source domain, it is difficult for the domain adaptation model to detect the newly added health categories.

Neglecting the above problems will cause the model to misdiagnose the health status of the equipment in practical applications, resulting in unnecessary economic losses. This problem occurs in the model due to the lack of transferable knowledge of the newly added health categories in the source domain during training, resulting in the domain invariant features extracted by the model only having strong correlation with the source domain health categories, and lack of key features that can identify the newly added health categories. We found that most of the prediction results of the model for the newly added health categories are distributed at the decision boundary of the source domain health category, so this means that the newly added health category has a higher amount of information in the mapped feature invariant space, as shown in Figure 1.

Figure 1.

After marginal distribution alignment, the newly added health categories are mostly at the decision boundary of the source domain health categories. The UDA method without considering the label domain expansion problem is difficult to detect the newly added health categories. Our proposed method aims to correctly classify the original health categories while identifying the newly added health categories.

In recent years, some researchers have used sample selection algorithms to extract informative samples in the target domain to assist model training, which is used to improve the diagnostic performance of unsupervised models. Active learning (AL) aims to select the most valuable samples from a pool of unlabeled samples using a query strategy. Among them, the pool-based active learning (Pool-Based AL)^19–21 sample selection method has been widely studied. Most previous active learning methods use a single query strategy to select samples and train models in the same domain.^22,23 With the development of transfer learning techniques, active learning is applied to cross-domain sample selection, so active transfer learning^24,25 has been intensively studied. Domain adaptation (DA), as a branch of transfer learning, combined with active learning is called active domain adaptation (ADA).^26–28 ADA is similar to the basic AL model training steps, which are generally divided into two parts: model training and query strategy, as shown in Figure 2. Fu et al.¹⁸ proposed a new transferable query selection (TQS) method for active domain adaptation, including transferable uncertainty, transferable domainness, and transferable committee.It is experimentally demonstrated that TQS can select the target samples with the largest amount of information under domain transfer. Su et al.²⁹ proposed an active learning method for transferring representations across domains. This active adversarial domain adaptation (AADA) method explores the duality between two related issues: adversarial domain alignment and importance sampling for cross-domain adaptation models. Zhou et al.³⁰ proposed a discriminative active learning method for domain adaptation to reduce the workload of data annotation, and demonstrated the effectiveness of this active domain adaptation algorithm. However, previous active domain adaptation do not consider the impact of label domain expansion on model diagnostic performance.

Figure 2.

Where (a) is the pool-based active learning framework and (b) is the LDE-ADA intelligent fault detection framework proposed in this paper.

In view of the above problems, this paper considers that the newly added healthy category samples in the domain invariant space after marginal distribution alignment have a high amount of information. An active domain adaptation intelligent fault detection framework LDE-ADA is designed to deal with the label domain expansion problem, which is used to solve the label domain expansion problem in cross-domain fault diagnosis. The method firstly uses the UDA model to learn domain invariant features, which are used to solve the domain bias problem of ADA and improve the query accuracy of newly added healthy samples. Then use the improved active learning query strategy to select the most valuable samples from the target domain sample pool for labeling. Finally, use the labeled fusion sample set to train the model again, and repeat the above steps. At the same time, an improved active learning query strategy is proposed to accurately select the newly added healthy category samples in the target domain to assist model training and solve the problem of label domain expansion. The strategy first uses an uncertainty-based sampling method to select the target domain samples that are indistinguishable by the model. Then, representative samples are selected from the above samples by clustering algorithm. This method can not only select the samples with the largest amount of information, but also ensure that the selected samples are representative, so that the model can maximize the performance in each round of training. The specific query strategy and training method are introduced in Section 3.

The main points and contributions of this paper are as follows:

(1) In this paper, an active domain adaptatioin intelligent fault detection framework LDE-ADA is proposed to solvethe label domain expansion problem. This method takes into account that in the feature invariant space after marginal distribution alignment, most of the newly added health categories are distributed in the decision boundary and have a high amount of information. Therefore, the framework first performs marginal distribution alignment to improve the accuracy of active learning sample queries.

(2) An improved active learning query strategy is proposed. The query strategy can more accurately select samples of the newly added health categories in the target domain, and is suitable for solving the label domain expansion problem of UDA.

(3) The influencing factors of the diagnostic performance of the active domain adaptation framework are analyzed through experiments. These influencing factors provide ideas for the efficient application of active domain adaptation models in the field of fault detection.

The rest of this paper is organized as follows: The second part is related concepts, including domain adaptation, active learning, and active domain adaptation. In the third part, the LDE-ADA intelligent fault detection framework is introduced in detail. In the fourth part, the effectiveness and superiority of the proposed method are proved on the bearing test platform, and the factors affecting the active domain adaptation are analyzed. The fifth part summarizes this paper.

Related work

Domain adaptation

Domain adaptation (DA) is designed to solve the problem of biased feature space mappings in the source and target domains. Unsupervised domain adaptation is one such paradigms, where no labeled sample data is available in the target domain. For unsupervised domain adaptation, we generally assume that the source domain $D_{s}$ has $n_{s}$ labeled samples, and the target domain $D_{t}$ has $n_{t}$ unlabeled samples, respectively denoted as: $X_{S} = {x_{i}^{s}, y_{i}^{s}}_{i = 1}^{n_{s}}$ and $X_{T} = {x_{j}^{t}}_{j}^{n_{t}}$ . In the case of sample marginal distributions $P_{s} (X) \neq P_{T} (X)$ and label domain $Y_{s}$ = $Y_{t}$ , learn a domain adaptation model. While correctly predict the source domain data, it minimizes the feature distributions difference between the source domain and the target domain to minimize the risk of misclassification $E_{(x^{t}, y^{t}) ~ D_{t}} [C (G (x^{t})) \neq y^{t}]$ .

In recent years, domain adaptation work under this assumption has achieved great success. The key idea is to fit the data distributions of the source and target domains by looking for the minimum domain spacing or using adversarial methods. However, when $Y_{s} = {1, 2, \dots, N_{class}^{s}}$ , $Y_{t} = {1, 2, \dots, N_{class}^{t}}$ and the number of health categories $N_{class}^{t} > N_{class}^{s}$ , it is difficult for the model trained under the above conditions to detect the newly added health categories, which affects the detection of the application target domain. We have known that a small number of informative labeled samples can greatly improve the performance of the model.³¹ Therefore, we are eager to select samples of these newly added healthy categories and utilize a small number of labeled samples to solve such problems.

Active learning

Active learning aims to select the samples that are most conducive to the improvement of model performance from unlabeled datasets, and this method can effectively solve the limitation of high sample labeling cost in some fields. The most important part of AL is the formulation of query strategy. At present, the main query strategies are committee query, uncertainty sampling query and representative query. The diagnostic effect of choosing a single query strategy is far less than the method of mixing multiple strategies.³² For AL, we generally assume that $U_{n}$ is represented as the dataset $U$ has $n$ unlabeled samples, and $L_{m}$ is represented as the training set $L$ has $m$ labeled samples. The goal is to design a query strategy $Q$ . Select the $K$ samples with the most abundant information from $U_{n}$ by $Q$ , and hand them over to the artificial expert $O$ for marking, which is denoted as $L_{K}$ . Then, the labeled $L_{K}$ samples are added to $L$ , $L \leftarrow L_{m} \cup L_{K}$ and $L_{K}$ will be removed from $U, U \leftarrow U_{n} \ L_{K}$ . Finally, the model is retrained with a new fusion sample set $L$ . Repeat this process until a high performance diagnostic model is obtained.

Only considering the query strategy of uncertainty sampling, it is unavoidable to select samples with similar amounts of information, and such samples have little improvement in model performance. On the other hand, query strategies that only consider representativeness may select samples from non-newly added health categories, resulting in increased labeling costs.

Active domain adaptation

ADA uses active learning to select a small number of target domain samples for labeling, so as to maximize the gain of the DA model. Different from the above work, our work aims to solve the label domain expansion problem in domain adaptation through active learning. Since the transferable knowledge of newly added health categories does not exist in the source domain, it is impractical to solve the label domain expansion problem only by UDA. However, AL can add the transferable knowledge of the newly added health categories to the source domain with the smallest label cost, and extract the relevant features of the newly added health categories to solve the problem of label domain expansion.

LDE-ADA intelligent fault detection framework

In this section, we introduce the detection method of LDE-ADA. Among them, Section 3.1 introduces the specific structure and loss function of the unsupervised adversarial domain adaptive networks based on minimum domain spacing (MDS-ADAN)³³; Section 3.2 introduces our active learning query strategy in detail; Section 3.3 summarizes the training steps of the LDE-ADA model.

Domain adaptation diagnostic model based on MDS-ADAN

In this work, we choose the MDS-ADAN model as the basic domain adaptation diagnostic model of the LDE-ADA model, and its network structure is shown in Figure 3. The model is divided into four parts: feature extractor $G_{f}$ , adaptation layer, classifier $C_{y}$ , and domain discriminator $D_{d}$ . where BN stands for batch normalization and DP stands for dropout operation. $G_{f}$ consists of the first four layers of AlexNet,³⁴ and an adaptation layer of dimension 256 is added after the feature extractor to prevent overfitting, as shown in Table 1. The MDS-ADAN model not only reduces the distribution differences between domains in the feature extraction stage, but also considers the importance of the weight parameters in the classification stage to fitting the feature distribution. It is reflected in the two use of the maximum mean difference (MMD) at the end of the adaptation layer and the classifier to measure the distribution difference between domains, and its mathematical expression is:

Table 1.

MDS-ADAN feature extractor and bottleneck layer structure and parameters.

Layer	Kernel size	Stride	Padding	Output size
Input	/	/	/	1024 × 1
Conv 1	32 × 1	4	0	64 − [249 × 1]
Max-pool 1	3 × 1	2	0	64 − [124 × 1]
Conv 2	5 × 1	2	2	192 − [62 × 1]
Max-pool 2	3 × 1	2	0	192 − [30 × 1]
Conv 3	3 × 1	2	1	384 − [15 × 1]
Conv 4	3 × 1	2	1	256 − [8 × 1]
Max-pool 3	3 × 1	2	0	256 − [3 × 1]
Flattening layer	1D signal	/	/	1024
Adaptation layer	/	/	/	256

MMD (X_{S}, X_{T}) = \begin{matrix} {‖ \frac{1}{\begin{matrix} | X_{s} | \end{matrix}} \sum_{x_{s} ϵ X_{S}} ϕ (x_{s}) - \frac{1}{\begin{matrix} | X_{T} | \end{matrix}} \sum_{x_{t} ϵ X_{T}} ϕ (x_{t}) ‖}_{H} \end{matrix}

(1)

Where $X_{s}$ and $X_{T}$ represent the source domain and target domain sample sets, respectively. $ϕ (\cdot) : X_{s}, X_{T} \to H$ represents the mapping from the original space to the regenerated Hilbert space.

Figure 3.

Model structure of MDS-ADAN.

Considering that the target domain labels are not available, the MDS-ADAN model draws on the idea of generative adversarial networks (GAN),³⁵ which adds a domain discriminator $D_{d}$ to perform domain discrimination on the input dataset. This process helps to make the feature representation robust, improve the generalization ability of the trained model, and avoid overfitting to small-sized labeled samples.³⁶ We use 1 to represent the source domain data and 0 to represent the target domain data, denoted as $D_{d} (G_{f}) \to {0, 1}$ . The objective function of $D_{d}$ is defined as:

\begin{matrix} L_{d} & = E_{x ~ P_{S} (x)} [\log G_{f} (D_{d} (x))] \\ + E_{x ~ P_{T} (x)} [\log (1 - G_{f} (D_{d} (x)))] \end{matrix}

(2)

The objective function $C_{y}$ is defined as:

\begin{matrix} L_{y} = - E_{x ~ P_{S} (x)} [\log (G_{f} (C_{y} (x)), y)] \end{matrix}

(3)

Where $x ~ P_{S} (x)$ indicates the sample comes from the source domain dataset, $x ~ P_{T} (x)$ indicates that the sample comes from the target domain dataset, and $y$ indicates the classification sample label. During training, the features extracted by the feature extractor confuse the domain discriminator. On the other hand, the domain discriminator is trained to correctly classify the source domain and target domain features, which is a game process. The adversarial loss is expressed as:

\begin{matrix} min_{θ_{f}, θ_{y}} max_{θ_{d}} L_{y} - μ L_{d} \end{matrix}

(4)

Where $θ_{f}, θ_{y}$ , and $θ_{d}$ are the weight parameters of $G_{f}, C_{y}$ , and $D_{d}$ , respectively. $μ$ is the hyperparameter.

The MDS-ADAN model not only performs distribution calibration at the adaptation layer, but also reduces the feature distribution differences between domains at the end of the classifier. Thus the overall loss is expressed as:

\begin{matrix} L_{G} = L_{y} + λ_{1} L_{MM D_{1}} + λ_{2} L_{MM D_{2}} + μ L_{d} \end{matrix}

(5)

Where $L_{MM D_{1}}$ and $L_{MM D_{2}}$ represent the distribution difference measure of the adaptation layer and the end of the classifier, respectively. $λ_{1}$ and $λ_{2}$ are hyperparameters, in this paper $λ_{1} = λ_{2} = 0.25$ , $μ = 1$ .

The MDS-ADAN model effectively fits the data distribution of the source and target domains in the pre-training stage of LDE-ADA, learns domain-invariant features, and improves the accuracy of active learning sample queries.

Active learning query strategy

Our purpose is to select the most valuable target domain samples for labeling, that is, newly added healthy category samples in the target domain. The commonly used methods based on uncertainty sampling are margin sampling method (MS) and entropy method (ET) to measure the classifier output vector, and select the samples that are most difficult to diagnose by the model. The expression formulas of margin sampling method equation (6) and entropy method equation (7) are as follows:

\begin{matrix} x_{M}^{*} = \underset{x \in U_{n}}{argmin} P (y_{1} | x) - P (y_{2} | x) \end{matrix}

(6)

\begin{matrix} x_{E}^{*} = \underset{x \in U_{n}}{argmax} \sum_{k = 1}^{K} - P (y_{k} | x) \log P (y_{k} | x) \end{matrix}

(7)

where, $P (y_{1} | x)$ and $P (y_{2} | x)$ represent the probability that the output of the sample $x$ is the largest possible category and the second largest possible category, respectively.

In this work, we propose an uncertainty sampling method that combines the above two sampling methods, namely the entropy margin difference algorithm (EM), which performs the difference operation between the entropy method and the margin sampling method. This method uses a smaller sampling set to improve the precision of sample query. To determine the effectiveness of the method, we randomly selecte two groups of target domain data (100 samples per group) and put them into a well-trained domain adaptation model for result analysis. These three uncertainty sampling based query methods are analyzed according to the output vector, as shown in Figure 4.

Figure 4.

The domain adaptation model is trained by 9 types of health categories in the source domain and 10 types of health categories in the target domain, and the output vector analysis is performed by using the uncertainty sampling query strategy. EM can accurately select newly added health categories.

The two groups of data are marked as group A and group B, in which the number of newly added health category samples in group A is 5, and the number of newly added health category samples in group B is 10. In group A, we set the number of selected samples to 5. It is found that the entropy method misselected a sample that is not a newly added health category. In group B, we set the number of selected samples to be 5 and 10. It is found that when the number of selected samples is 10, the margin sampling method is not effective for the selection of newly added health categories. To sum up, it is found from the three groups of experiments that the EM method can effectively select the newly added healthy category samples of each group. Therefore, our uncertainty sampling query strategy is:

\begin{matrix} x_{EM}^{*} = \underset{x \in U_{n}}{argmax} x_{E}^{*} - x_{M}^{*} \end{matrix}

(8)

Only the sample data selected by the query method of uncertainty sampling may have great similarity in feature information. Such samples do not maximize the performance of the model in each round of training. Therefore, it is considered that in the target domain samples selected in each round, the cluster center is determined by K-means clustering, and several samples closest to the cluster center are selected for manual annotation. The mathematical formula for measuring the distance between the sample and the cluster center is:

\begin{matrix} d (x, y) = \\ \begin{matrix} \sqrt{{(x_{1} - y_{1})}^{2} + {(x_{2} - y_{2})}^{2} + \cdot \cdot \cdot + {(x_{n} - y_{n})}^{2}} \end{matrix} \end{matrix}

(9)

Where $d (x, y)$ represents the distance from the sample $x$ to the cluster center $y$ , and $n$ represents the vector dimension.

Intelligent fault detection model based on LDE-ADA

Based on the analysis of the domain adaptation diagnostic model and active learning query strategy, we divide the training process of the LDE-ADA intelligent fault detection framework into three parts.

Step A: First, we pretrain a domain adaptation diagnostic model to fit the data distribution of the same health categories in the source and target domains, and learn domain-invariant features. In order to accurately select the newly added health category of the target domain, this step is crucial. Among them, the mathematical expression of the cross-entropy loss function of training the network according to the source domain samples is:

\begin{array}{l} L (X_{S}, Y_{s}) = \\ \begin{matrix} - E_{(x_{S}, y_{S}) ~ (X_{S}, Y_{S})} \sum_{k = 1}^{K} l (y_{s} = k) \log P (y_{k} | x_{s}) \end{matrix} \end{array}

(10)

\begin{matrix} min_{G_{f}, C_{y}} L (X_{S}, Y_{S}) \end{matrix}

(11)

Where $K$ represents the number of categories; $P (y_{k} | x_{s})$ represents the probability of the $k$ -th position of the output vector when the input vector is $x_{s}$ ; when the classification is correct $l (y_{s} = k) = 1$ , and when the classification is wrong $l (y_{s} = k) = 0$ .

Step B: In this step, on the basis of Step A, we conduct a query of newly added healthy categories in the target domain. The purpose is to use this category of samples to assist the model for retraining to solve the problem of label domain expansion. The EM method is used to measure the classifier output vector, and the samples of the target domain that are difficult to be discriminated by the model are selected, such as formula equation (8).

Then, $K$ cluster centers are selected for the above samples, and $K$ samples closest to the cluster centers areselected according to formula equation (9). This method can avoid repeated selection of samples with similar amounts of information.

Step C: Finally, the selected $K$ samples are manually labeled and added to the source domain sample set $X_{s}$ . At the same time, the selected samples are deleted from the target domain sample set $X_{T}$ . The number of samples in the fusion sample set is expressed as $X_{S, K} = {x_{1}, x_{2}, \dots x_{n}, \dots, x_{n + K}}, X_{T, K} = {x_{1}, x_{2}, \dots, x_{n - K}}$ . Based on the weight parameters of the previous round, the model is retrained according to the labeled fusion sample set $X_{S, K}$ . Repeat Step B and Step C to make the model achieve the desired effect. The specific algorithm is summarized in Table 2.

Table 2.

The training process of the LDE-ADA framework.

Algorithm 1 LDE-ADA model
Input: labeled source data $X_{S}$ ; unlabeled target data $X_{T}$ ; number of AL rounds $r$ ; number of batch $b$ ; number of cluster centers $K$ . Model: $M = {G_{f}, D_{d}, C_{y}}$ .
Output: LDE-ADA model for detecting target domains.
1: Train $M$ with ( $X_{S}, X_{T}$ ) → Initialize model parameters $θ$ 2: for round r = 0 to $r$ do 3: for batch = 1 to $b$ do 4: calculate $x^{}$ according to Eq.8 6: each batch select $N$ samples according to $x^{}$ 7: find the centers $C = {c_{1}, c_{2}, \dots c_{K}}$ according to $N \times b$ samples 8: calculate each sample to the centers distance $d$ according to Eq.9 9: get label samples $L_{K}$ according to minimum distance 10: $X_{S}$ ← $X_{S} \cup L_{K}$ $X_{T}$ ← $X_{T} \ L_{K}$ 11: Train $M$ with ( $X_{S}, X_{T}$ ) 12: end
13: end

When dealing with unseen target domain data, the diagnostics of models often make diagnostic outputs that are overconfident and untrustworthy.³⁷ While minimizing the domain shift, the LDE-ADA framework uses manual intervention to correct the diagnosis of the target domain for the newly added health categories whose distribution is unknown, so as to improve the model’s ability to detect the health outside the source domain health category distribution. When the target domain distribution is unknown, the diagnosis of the LDE-ADA framework will still have high confidence.

Experimental results and analysis

Datasets and methods

Dataset

The rotating machinery dataset we use is from the public database of Case Western Reserve University, and the test object is a bearing. The test platform is shown in Figure 5. The experimental platform consists of a 2 HP motor, torque transducer/encoder, dynamometer, and control electronics. The bearings were seeded with faults using EDM, and the damage diameter is divided into 0.007, 0.014, 0.021, 0.028, and 0.04″, respectively.The bearings to be tested are divided into drive end bearings and fan end bearings. Bearing health status is divided into Normal state (NS), Inner race fault (IF), Outer race fault (OF), and Roller fault (RF). All of these are collected from the motor speed of 1797, 1772, 1750, and 1730 rpm four conditions.

Figure 5.

Case Western Reserve University bearing experimental test bench.³⁸

After data processing, we select the bearing dataset of the drive end for experiments. Among them, the transfer task 9A represents the nine types of health state sampling data of NS, IF, OF, and RF when the load is 0 HP. IF and OF have three damage states with diameters of 0.007, 0.014, and 0.021″. OF has two damage states with diameters of 0.007and 0.014″. Similarly, the transfer task 9B represents the sample data with a load of 1 HP, and the transfer task 9C represents the sample data with a load of 2 HP. transfer tasks 10B, 10C, and 10D represent sampled data for 10 health states at loads of 1, 2, and 3 HP. Specifically as shown in Table 3.

Table 3.

Rolling bearing experimental transfer task description.

Transfer tasks	Source domain	Target domain	Source domain health conditions	Target domain health conditions	Number of source and target domains
9A–10B	0 HP/1797 rpm	1 HP/1772 rpm	NS, IF (0.007/0.014/0.021), OF (0.007/0.014/0.021), and RF (0.007/0.014)	NS, IF (0.007/0.014/0.021), OF (0.007/0.014/0.021), and RF (0.007/0.014/0.021)	1100/1000
9A–10C	0 HP/1797 rpm	2 HP/1750 rpm
9A–10D	0 HP/1797 rpm	3 HP/1730 rpm
9B–10C	1 HP/1772 rpm	2 HP/1750 rpm
9B–10D	1 HP/1772 rpm	3 HP/1730 rpm
9C–10D	2 HP/1750 rpm	3 HP/1730 rpm

Methods

In our experiments, we select two domain adaptation models, MDS-ADAN and DANN.³⁹ When training the domain adaptation model, set the batch size to 100 and the iteration period to 200 rounds. The AL methods involved in the comparison include LDE-ADA, Random, Uncertainty, Cluster, AC-DANN, and AADA.²⁹ The active learning round is set to 20 rounds, and the number of labeled samples in each round is 3.

LDE-ADA

The domain adaptation model is MDS-ADAN and the AL framework adopted is the method proposed in Section 3.3.

Random

The domain adaptation model is MDS-ADAN and the AL query strategy is to randomly select the target domain samples.

Uncertainty

The domain adaptation model is MDS-ADAN and the AL query strategy selects target domain samples for methods based on uncertainty sampling (including entropy methods and margin sampling methods).

Cluster

The domain adaptation model is MDS-ADAN and the AL query strategy is a method based on K-means clustering to select the target domain samples.

AC-DANN

The domain adaptation model is DANN and the AL framework adopted is the method proposed in Section 3.3.

AADA

The domain adaptation model is DANN and the AL query strategy is to select samples based on importance weights.

In order to ensure the accuracy of the experiment, 10 experiments are performed for each task and the average value is taken. The specific experimental results are shown in Table 4.

Table 4.

Accuracy rate of rolling bearing fault data set (%).

Framework	DA Model	Method	Transfer tasks
			9A–10B	9A–10C	9A–10D	9B–10C	9B–10D	9C–10D
Non-AL	MDS-ADAN	MDS-ADAN	90.20 ± 0.96	89.76 ± 1.23	88.01 ± 1.26	90.76 ± 0.25	89.92 ± 1.07	90.11 ± 1.25
	DANN	DANN	78.37 ± 1.36	78.29 ± 1.33	81.46 ± 1.10	78.81 ± 0.95	78.55 ± 1.12	78.37 ± 1.36
AL	MDS-ADAN	LDE-ADA	99.66 ± 0.26	97.81 ± 1.26	99.69 ± 0.22	99.24 ± 0.37	99.45 ± 0.36	99.66 ± 0.26
		Random	91.53 ± 1.27	91.93 ± 0.52	89.45 ± 0.71	91.76 ± 0.52	92.03 ± 0.52	92.08 ± 0.43
		Uncertainty	97.09 ± 1.41	98.67 ± 0.56	97.68 ± 0.82	98.42 ± 0.92	96.03 ± 1.60	98.09 ± 0.93
		Cluster	91.40 ± 0.62	92.44 ± 0.98	91.73 ± 0.67	94.66 ± 0.56	93.71 ± 0.52	94.05 ± 0.66
	DANN	AC-DANN	93.36 ± 1.23	93.95 ± 0.89	96.86 ± 1.25	93.11 ± 1.33	91.67 ± 2.17	93.36 ± 1.23
		AADA	90.47 ± 0.95	93.15 ± 1.26	91.97 ± 0.69	92.58 ± 1.02	92.50 ± 0.75	92.23 ± 0.62

Experimental results and analysis

In order to effectively verify the detection effect of the LDE-ADA framework on the newly added health category, we show the accuracy, confusion matrix, and T-SNE visualization results for the transfer tasks 9A–10B. We illustrate the influencing factors of active domain adaptation and demonstrate the superiority of the LDE-ADA framework from the following three aspects.

Active learning: The active domain adaptation model has higher accuracy than the domain adaptation model, as shown in Table 4. Compared with the domain adaptation methods MDS-ADAN and DANN, the average accuracy of the active domain adaptation methods LDE-ADA and AC-DANN for six types of transfer tasks is improved by 9.39% and 14.56%, respectively. As can be seen from Figure 7(a) and (e), on the basis of different domain adaptation models, the active learning framework in this paper can effectively detect newly added health categories. Among them, the accuracy rate of LDE-ADA for newly added health categories is as high as more than 98%. Through the analysis of T-SNE visualization, LDE-ADA can effectively fit the feature distribution of the same category of health status, and distinguish the newly added health category as shown in Figure 8(a). This shows that the method proposed in this paper can effectively solve the problem of label domain extension.

Domain adaptation model: A good domain adaptation model is a precondition for the ADA model to be able to diagnose efficiently. A good domain adaptation model extracts more critical feature information in each round of active learning. As shown in Table 4, LDE-ADA and AC-DANN have the same query strategy, but we find that the initial accuracy of active domain adaptation is determined by the fitting effect of MDS-ADAN and DANN on the source and target domains. At the same time, the domain adaptation model is used as the training model in each round of active learning, and its ability to extract key features affects the entire training stage. If the model cannot extract the key features that fit the two domains, the results of active domain adaptation are unsatisfactory. Conversely, the stronger the fitting ability of the DA model to the two domains, the better the effect of active domain adaptation. This shows that the diagnostic capability of the active domain adaptation model is related to the domain adaptation model.

Query strategy: An excellent query strategy can select the samples that are most helpful for model performance improvement from the unlabeled target domain. As shown in Table 4, the query strategies of Random, Uncertainty, Cluster, and AADA are different. Among them, Random and Cluster have great randomness, so it is difficult to select the most valuable samples. The samples selected by these two query strategies are also difficult to improve the performance of the model, and the accuracy does not fluctuate much, as shown in Figure 6. Looking at Figure 7 at the same time, it is found that this method cannot cope with the label domain expansion problem. Although Uncertainty can pick out samples that are difficult for the model to choose, because of the existence of similar samples, the performance of the model is not as good as LDE-ADA. Compared with the latest active domain adaptation method AADA, LDE-ADA not only has advantages in the domain adaptation model, but also the query strategy can effectively select the desired samples for labeling. This shows that the diagnostic ability of the active domain adaptation model is related to the choice of query strategy.

As can be seen from Table 4, the average accuracy of the six transfer tasks is about 99.25%. From Figure 7(a), it is found that the accuracy rate of LDE-ADA for the detection of newly added health categories is more than 98%. At the same time, it is found from Figure 8(a) that LDE-ADA can effectively fit the data distribution of the source and target domains. Through the above arguments, it is proved that the method can effectively identify the newly added health categories in the target domain.

Figure 6.

The accuracy of the transfer task 9A–10B. In order to ensure the accuracy of the results, the accuracy curve is obtained from the average value of 10 experiments.

Figure 7.

Confusion matrix of transfer tasks 9A–10B, where the horizontal axis is the predicted label, the vertical axis is the true label, and the label 9 is the newly added health category. Where (a), (b), and (c) are the confusion matrix accuracy rates for each category label of LDE-ADA, Random, and Uncertainty, respectively. (d), (e), and (f) are the confusion matrix accuracy of each category label for Cluster, AC-DANN, and AADA, respectively.

Figure 8.

T-SNE visualization of transfer tasks 9A–10B. Where (a), (b), and (c) are the visualization results of LDE-ADA, Random, and Uncertainty, respectively. (d), (e), and (f) are the visualization results of Cluster, AC-DANN, and AADA, respectively. “o” represents 9 types of data in the source domain, and “*” represents 10 types of data in the target domain. The purple box in (a) is the newly added health category.

Research on the number of annotations

Further study the relationship between the number of newly added health categories in the target domain and the number of labeled samples per round of LDE-ADA. We compare and analyze the diagnostic effect of the model when the number of newly added health categories is 1, 2, and 3, and the number of labeled samples is 3, 5, and 10. The details are shown in Table 5, in which each experimental result is obtained by taking the average of five experiments.

Table 5.

Research on the relationship between the number of annotations and the number of newly added health categories.

	Number of annotations
	0	3	5	10
9A–10B	90.20	99.23	99.91	99.98
8A–10B	81.40	89.82	99.49	99.98
7A–10B	63.00	78.73	85.71	94.84

It can be found that with the increase of newly added health categories in the target domain, LDE-ADA labels three samples per round cannot meet the diagnostic conditions. One way to solve this problem is to increase the amount of annotations per round. When the number of newly added health categories reaches 3, increasing the number of labeled samples in each round to 10 can increase the accuracy rate to over 94%. Observing the experimental results of selecting 10 labeled samples in each round, it is found that the diagnostic effect of LDE-ADA is related to the number of labels. The more newly added health categories are added to the target domain, the more target domain samples need to be labeled in each round to improve the performance of the model. In Table 5, observing the experimental results of the transfer tasks 9A–10B, it is found that the more samples are labeled in each round, the greater the improvement of model performance.

Conclusion and future work

To deal with the label domain expansion problem, this paper proposes a diagnostic framework LDE-ADA based on active domain adaptation. The framework is divided into three stages: the first stage, pre-training the DA model. While correctly classifying the source domain samples, domain-invariant features are learned by optimizing the MMD loss function and adopting an adversarial strategy. In the second stage, an improved active learning query strategy is used to select the target domain samples that improve the model performance the most. In the third stage, the model is retrained using the labeled fusion sample set, and the above steps are repeated. Experiments show that the generalization ability of the framework is remarkable, and the diagnostic performance of the DA model embodied in the LDE-ADA framework has been improved. At the same time, when the target domain has more health categories than the source domain, it can effectively detect the newly added health categories.

we analyze the influencing factors of active domain adaptation and find that the diagnostic effect of active domain adaptation model is related to domain adaptation model, query strategy and the number of labeled samples per round. The domain adaptation model determines the fitting effect of the data distribution of the source and target domains. The query strategy determines the selection ability of valuable samples for each round. The number of labeled samples in each round determines the speed at which the model’s ability is improved. Next we apply this framework to more areas to explore more factors that influence active domain adaptation.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Project of the National Natural Science Foundation of China (no. 51204185, 51974295), Jiangsu Postgraduate Research and Practice Innovation Program Project (2021ALA02016), the Graduate Innovation Program of China University of Mining and Technology (2022WLJCRCZL267), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX22_2653).

ORCID iDs

Ruicong Zhang

Yu Bao

References

Chen

Wang

Qiao

, et al. Basic research on machinery fault diagnostics: past, present, and future trends. Front Mech Eng 2018; 13(2): 264–291.

Wang

Han

Chu

, et al. Vibration based condition monitoring and fault diagnosis of wind turbine planetary gearbox: a review. Mech Syst Signal Process 2019; 126: 662–685.

Lei

Yang

Jiang

, et al. Applications of machine learning to machine fault diagnosis: a review and roadmap. Mech Syst Signal Process 2020; 138: 106587.

Saufi

Ahmad

ZAB

Leong

, et al. Challenges and opportunities of deep learning models for machinery fault detection and diagnosis: a review. IEEE Access 2019; 7: 122644–122662.

Zhang

Qin

, et al. A systematic review of deep transfer learning for machinery fault diagnosis. Neurocomputing 2020; 407: 121–135.

Shao

Mcaleer

Yan

, et al. Highly accurate machine fault diagnosis using deep transfer learning. IEEE Trans Ind Inform 2019; 15(4): 2446–2455.

Guo

Lei

Xing

, et al. Deep convolutional transfer learning network: a new method for intelligent fault diagnosis of machines with unlabeled data. IEEE Trans Ind Electron 2019; 66(9): 7316–7325.

Tzeng

Hoffman

Zhang

, et al. Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:14123474, 2014.

Yang

Lei

Jia

, et al. A polynomial kernel induced distance metric to improve deep transfer learning for fault diagnosis of machines. IEEE Trans Ind Electron 2019; 67(11): 9747–9757.

10.

Shoeleh

Yadollahi

Asadpour

Domain adaptation-based transfer learning using adversarial networks. Knowl Eng Rev 2020; 35: e7.

11.

Yang

Chen

, et al. CLDA: an adversarial unsupervised domain adaptation method with classifier-level adaptation. Multimed Tools Appl 2020; 79(45): 33973–33991.

12.

Chen

Yang

Zhang

, et al. Conditional adaptation deep networks for unsupervised cross domain image classification. In: Proceedings of the 2019 14th IEEE conference on industrial electronics and applications (ICIEA), Xian, Peoples R China, 2019. New York, NY: IEEE.

13.

Zhao

Liu

Wen

A new method of image classification based on domain adaptation. Sensors 2022; 22(4): 1315.

14.

Pandhare

Miller

, et al. Intelligent diagnostics for ball screw fault through indirect sensing using deep domain adaptation. IEEE Trans Instrum Meas 2020; 70: 1–11.

15.

Jiao

Zhao

Lin

, et al. A mixed adversarial adaptation network for intelligent fault diagnosis. J Intell Manuf 2022; 33: 2207–2222.

16.

Zhao

Jiang

Wang

, et al. Joint distribution adaptation network with adversarial learning for rolling bearing fault diagnosis. Knowl Based Syst 2021; 222: 106974.

17.

Chen

Sakaridis

, et al. Domain adaptive faster r-cnn for object detection in the wild. In: 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah

18.

Cao

Wang

, et al. Transferable query selection for active domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Nashville, TN, 20–25 June 2021. New York, NY: IEEE.

19.

Nigam

Mccallum

. Pool-based active learning for text classification. In: Proceedings of the conference on automated learning and discovery (CONALD), San Francisco, CA, 24–27 July 1998. New York, NY: ACM.

20.

Zhao

Sukthankar

. Importance-weighted label prediction for active learning with noisy annotations. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012. New York, NY: IEEE.

21.

Zhang

Yin

Guo

Pool-based active learning with query construction. In: Wang

(eds) Foundations of intelligent systems. Berlin, Heidelberg: Springer. 2011, pp.13–22.

22.

Zhai

Deng

, et al. Efficient active learning by querying discriminative and representative samples and fully exploiting unlabeled data. IEEE Trans Neural Netw Learn Syst 2020; 32(9): 4111–4122.

23.

Joshi

Porikli

Papanikolopoulos

. Multi-class active learning for image classification. In: Proceedings of the 2009 IEEE conference on computer vision and pattern recognition, Miami, FL, 20–25 June 2009. New York, NY: IEEE.

24.

Singh

Chakraborty

. Deep active transfer learning for image recognition. In: Proceedings of the 2020 international joint conference on neural networks (IJCNN), Glasgow, UK, 19–24 July 2020. New York, NY: IEEE.

25.

Ashiquzzaman

Lee

, et al. Study on human activity recognition using semi-supervised active transfer learning. Sensors 2021; 21(8): 2760.

26.

Saha

Rai

Daumé

, et al. Active supervised domain adaptation. In: Proceedings of the joint European conference on machine learning and knowledge discovery in databases, Athens, Greece, 5–9 September 2011. Berlin, Heidelberg: Springer.

27.

Rai

Saha

Daumé

III H

, et al. Domain adaptation meets active learning. In: Proceedings of the NAACL HLT 2010 workshop on active learning for natural language processing, Stroudsburg, PA, 6 June 2010. New York, NY: ACM.

28.

Huang

Yan

Active sentiment domain adaptation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Vancouver, Canada, July 2017. New York, NY: ACM.

29.

J-C

Tsai

Y-H

Sohn

, et al. Active adversarial domain adaptation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, Snowmass, CO, 1–5 March 2020. New York, NY: IEEE.

30.

Zhou

Shui

Yang

, et al. Discriminative active learning for domain adaptation. Knowl Based Syst 2021; 222: 106986.

31.

Kremer

Steenstrup Pedersen

Igel

Active learning with support vector machines. Wiley Interdiscip Rev Data Min Knowl Discov 2014; 4(4): 313–326.

32.

Ren

Xiao

Chang

, et al. A survey of deep active learning. ACM Comput Surveys 2021; 54(9): 1–40.

33.

Ruicong

Zhongtian

, et al. Unsupervised adversarial domain adaptive for fault detection based on minimum domain spacing. Adv Mech Eng 2022; 14(3): 16878132221088647.

34.

Technicolor

, et al. ImageNet classification with deep convolutional neural networks. Commun ACM 2017; 60: 84–90.

35.

Goodfellow

Pouget-Abadie

Mirza

, et al. Generative adversarial nets. Adv Neural Inform Process Syst 2014; 27: 2672–2680.

36.

Han

Liu

Yang

, et al. A novel adversarial learning framework in deep convolutional neural network for intelligent diagnosis of mechanical faults. Knowl Based Syst 2019; 165: 474–487.

37.

Zhou

Han

Droguett

EL.

Towards trustworthy machine fault diagnosis: a probabilistic Bayesian deep learning framework. Reliab Eng Syst Saf 2022; 224: 108525.

38.

Loparo

. Case western reserve university bearing data centre website [Z]. 2012

39.

Ganin

Lempitsky

Unsupervised domain adaptation by backpropagation. In: 32nd International Conference on Machine Learning, Lille, France, 2015. PMLR.