Gearbox fault diagnosis method based on adaptive multi-sensor data level fusion and fine-grained domain adaptation

Abstract

Gearboxes are critical in industrial machinery, and leveraging multi-sensor data is crucial for effective fault diagnosis. However, robust diagnosis is particularly challenging under varying operational conditions, especially with unlabeled data. This paper introduces an adaptive multi-sensor data level fusion and fine-grained domain adaptation approach (AMDFFDA) to improve gearbox fault diagnosis. The method enhances performance across different operational conditions by combining multi-sensor data fusion at the data level with domain adversarial techniques. Firstly, each single sensor data undergoes preprocessing using the eccentricity technique, followed by the determination of their optimal weighting factors for the effective convergence of data from multiple sensors, with the aim of minimizing the mean square error. Subsequently, an adversarial domain adaptation module is employed to extract features that remain invariant across domains through continuous adversarial training. Ultimately, a teacher model generates reliable pseudo-labels for the target domain, thereby reducing class-conditional shifts between the source and target domains. The findings on the MCC5 dataset demonstrate that AMDFFDA surpasses its competitors in the realm of unsupervised cross-domain fault diagnosis, achieving an average accuracy of 75%. Furthermore, ablation studies validated the contribution of each module within our method, and parameter sensitivity analyses highlighted their impact on the model’s effectiveness.

Keywords

gearbox fault diagnosis multi-sensor data fusion single domain adaptation unsupervised learning

Introduction

In the contemporary industrial field, the efficient and stable operation of mechanical equipment is the cornerstone to ensure production efficiency and operational safety. As a key power transmission component, the gearbox’s operational condition is pivotal to the overall system’s dependability and longevity. However, the gearbox is subjected to complex loads and harsh working environments during the transmission process, so it is very easy to have various faults such as gear wear, tooth surface fatigue, missing teeth, and broken teeth.¹ In order to prevent mechanical failures from causing further economic losses and casualties, the reliable and automated diagnosis of fault within rotational machinery is essential for effective equipment maintenance and management. In sophisticated mechanical systems, overreliance on signals from a single sensor can omit vital data, compromising the diagnostic process.

As sensor technology, artificial intelligence, and big data analytics continue to evolve, data collection during the operation of the gearbox has become more convenient and reliable, which provides a solid foundation for fault diagnosis technology with data analysis as the core. At present, the collection and analysis method based on vibration signals has a dominant position due to its mature technical system and high accuracy.² Nonetheless, there are inherent limitations to the fault message about gearboxes that can be captured by monitoring data from a single sensor, and the influence of external noise and environmental fluctuations has weakened the monitoring ability of the overall operation status of the gearbox to a certain extent. To surpass these constraints and enhance the precision and dependability of gearbox fault diagnosis, researchers have increasingly adopted multi-sensor data collected from multiple critical positions and directions, leveraging advanced information fusion techniques to improve diagnostic accuracy. Yang et al.³ crafted an algorithm for data fusion that integrates signals from various sensors at the data tier. This algorithm is underpinned by an improved fuzzy support system and employs an adaptive noise technique to deconstruct and rebuild the sensor signals. Feature-level fusion involves employing sophisticated algorithms for feature selection and extraction while extracting features. These algorithms deeply integrate information from various sensors’ data. Guo et al.⁴ adopted a multi-scale cyclic frequency demodulation framework to achieve high-precision diagnosis of gearbox faults by calculating the covariance matrix of multi-sensor information. Li et al.⁵ introduced a deep fusion network that spans multiple layers and incorporates an attention mechanism. This setup enables the integration of deep features from both the branch and central networks, thereby accomplishing adaptive, hierarchical information fusion. Shi et al.⁶ fused decision information based on DS evidence theory and convolutional network architecture to identify fault within the hydraulic reversing valve. Fu et al.⁷ employed the diagnostic outcomes from the GA-BP algorithm as the foundational probabilities for the task of diagnosing gearbox faults, and integrated DS evidence theory with GA to fuse decisions, thereby arriving at the conclusive identification of faults. Although the above methods have improved diagnostic performance to some extent, they often face difficulties in mitigating the effects of redundancy and localized anomalies within multi-sensor data. In addition, uneven error distributions among different sensors can lead to biased fusion outcomes, thereby impairing the overall reliability of fault diagnosis.

While multi-sensor data fusion has been successful in fault diagnosis, its effectiveness in predictive maintenance tasks across various operating conditions is fundamentally constrained by the assumption of distribution consistency.⁸ However, real-world industrial data often falls short of this assumption. With shifts in the distribution of data from multiple sensors, the efficacy of fault diagnosis consistently deteriorates. With the advent of transfer learning, researchers have started to integrate this approach to address the issue of diagnosis faults in mechanical equipment across varying operational conditions. Liu et al.⁹ accomplished the adaptive weighting of multi-sensor data fusion by employing a shared feature extraction module. This module specializes in domain-specific feature extraction, using the local maximum mean discrepancy (LMMD) to harmonize disparate subdomains. The output decision module then integrates the classification results from each source domain based on a weighted score calculated using the maximum mean difference, thereby facilitating the diagnosis of faults across different domains for axial piston pumps. Jiang et al.¹⁰ studied a multi-sensor data adaptive weighting strategy (MSAWS) combined with a semi-supervised OTPCL framework. This method dynamically determines the significance of various sensor datasets and assigns them appropriate weights. It improves the extraction of distinctive, domain-agnostic features by fine-tuning the temperature coefficient, which in turn enables cross-domain fault diagnosis even with a scarcity of tagged data. Zhang et al.¹¹ introduced a framework for extracting and merging features from multiple sensors, integrating both convolutional neural networks and Transformer models. They enabled learning that is invariant to domains by employing a weighted adversarial approach and threshold-based supervised loss functions, and incorporated entropy maximization and minimum loss to classify samples from unknown categories, thereby elevating the effectiveness of cross-domain fault diagnosis for rotary equipment under changeable operating conditions. Warke et al.¹² introduced a TrAdaBoost regressor-based approach for instance domain adaptation, and optimized the precision of tool wear estimation and generalizability of the model across various machining environments by introducing Squeeze-and-Excitation module to assign different weights to multi-sensor pixel matrices. Lin et al.¹³ introduced a diagnosis method for few-shot meta-transfer utilizing information fusion-based model-agnostic meta-learning (IFMAML). By enhancing and fusing the RGB channel information of multi-sensor data, this method aims to enhance diagnostic accuracy and computational efficiency across diverse working scenarios. Nevertheless, current approaches typically concentrate on global domain alignment and often overlook the distributional shifts among individual classes. Additionally, due to the absence of annotated data in the target domain, these methods generally lack mechanisms to provide class-specific guidance, thereby restricting their capacity to achieve substantial performance gains in target-domain classification tasks.

Despite recent advancements, fault diagnosis methods that integrate multi-sensor data fusion and transfer learning still face several challenges. Most existing approaches overlook the non-uniformity of sensor noise characteristics and are unable to effectively handle redundant or anomalous inputs, thereby compromising the robustness and representativeness of the fused features. These methods typically compress all sensor inputs into a unified representation, discarding potentially critical differences embedded in the raw signals, which further limits the diversity of feature representations across source and target domains. Moreover, conventional transfer learning frameworks primarily focus on aligning global distributions, lacking fine-grained mechanisms to guide class-level discrimination in unlabeled target domains. This shortfall undermines the model’s ability to adapt to nuanced class structures and ultimately constrains its transfer performance under complex operating conditions.

To address the aforementioned challenges, this study proposes a gearbox fault diagnosis method based on adaptive multi-sensor data-level fusion and fine-grained domain adaptation (AMDFFDA). The method comprises three core modules: a multi-sensor data-level fusion module, an adversarial domain adaptation module, and a category-conditional alignment module. Specifically, the multi-sensor data-level fusion module employs an eccentric distance–based technique to uniformly preprocess individual sensor signals. It then performs adaptive fusion through an optimal weighting strategy, and combines the fused output with original sensor data to construct multi-sensor, multi-level datasets, thereby enhancing the completeness of fault information. The adversarial domain adaptation module introduces class-discriminative information into the domain discriminator. By minimizing inter-domain discrepancies through adversarial training, it enhances the model’s global generalization ability under varying operating conditions. The category-conditional alignment module is designed to generate reliable pseudo-labels for the unlabeled target domain, thereby mitigating conditional shift between source and target domain categories and improving class-level alignment accuracy. A series of ablation experiments and comparative evaluations with existing diagnostic methods have been conducted, demonstrating the feasibility and effectiveness of the proposed method in gearbox fault diagnosis tasks. The main contributions of this work can be summarized as follows:

(1) Considering that sensor data in real-world monitoring environments often suffer from varying levels of noise and redundancy, which can undermine both diagnostic accuracy and robustness, this study introduces an adaptive multi-sensor data-level fusion approach. Specifically, a preprocessing technique based on eccentric distance is applied to individual sensor signals to eliminate outliers, followed by a fusion strategy that minimizes the total mean square error (MSE), thereby achieving optimized and denoised data integration.

(2) To prevent the loss of initial fault information during the fusion of multi-sensor data, this study proposes a method for constructing a multi-sensor, multi-level dataset. By integrating preprocessed single-sensor data with fused multi-sensor data, the approach not only reduces redundancy in the original signals but also preserves the integrity of fault-relevant information, thereby significantly improving the classification accuracy of the diagnostic model.

(3) To achieve precise alignment of multi-sensor data distributions under varying working conditions, this study employs a teacher model to generate reliable pseudo-labels for unlabeled data originating from novel operating scenarios. This strategy effectively alleviates feature discrepancies among corresponding categories across domains and enhances the fine-grained accuracy of cross-domain feature alignment within the adversarial domain adaptation process.

The remainder of this paper is structured as follows: Section “Preliminaries” introduces a method for gearbox fault diagnosis method based on adaptive multi-sensor data level fusion and fine-grained domain adaptation. Section “Results and discussion” presents a series of comparative experiments conducted on public datasets, including analyses that incorporate parameter sensitivity evaluations. Finally, Section “Conclusions” provides a conclusive summary of the principal findings and contributions of this work.

Preliminaries

Problem formulation

This paper examines the issue of cross-domain fault diagnosis for gearboxes, leveraging multi-sensor data across various operating conditions. In general, the approach outlined in this paper is predicated on the following presuppositions:

(1) For the source domain task, labeled multi-sensor data is acquired under specific operating states of the machine. These sensors are of the same type but are placed in dissimilar locations.

(2) For the target domain task, the absence of labels in multi-sensor data precludes its direct application in training fault diagnosis models.

(3) The diagnostic tasks across domains are identical in that they share a common label space.

Assume that $X$ represent the input space and $Y$ represent its corresponding label space. $D_{S} = {X_{S}^{i}, y_{S}^{i}} = {(a_{S}^{i 1}, a_{S}^{i 2}, \dots, a_{S}^{im}), y_{s}^{i}}_{i = 1}^{n_{S}}$ represent all samples and their respective labels from the source domain, where $n_{S}$ denotes the aggregate of samples within the source domain. $a_{S}^{i 1}, a_{S}^{i 2}, \dots, a_{S}^{im}$ refer to the multi-sensor data collected from m distinct sensors for the same sample in the source domain, and $y_{i}^{S} \in {1, 2, \dots, N}$ correspond to the label common to these multi-sensor datasets. $D_{T} = {X_{T}^{j}} = {(a_{T}^{j 1}, a_{T}^{j 2}, \dots, a_{T}^{j m})}_{j = 1}^{n_{T}}$ represent all the unlabeled samples in the target domain, where $n_{S}$ denote the aggregate of samples within the target domain, and $a_{S}^{i 1}, a_{S}^{i 2}, \dots, a_{S}^{im}$ refer to the collective multi-sensor data from m distinct sensors for a single sample in the target domain, all bearing the identical label. In this research, $D_{S}$ and $D_{T}$ are drawn from the combined distributions $P (X, Y)$ and $Q (X, Y)$ , and these two distributions meet the condition $P = Q$ . Underpinned by these presuppositions, the goal is to construct a deep learning model represented by $y = g (x)$ and employ adversarial learning techniques to mitigate the effects of domain shift on the target domain’s feature extractor $E_{T}$ and classifier $C_{T}$ , thereby reducing the risk of the target domain mission $R_{T} (g) = \underset{(x, y) ~ Q}{⪻} [g (x) \neq y]$ .

The proposed method

Addressing the issues of robustness deficiency, loss of raw information, and fine-grained category ambiguity in current multi-sensor data fusion methods for cross-domain gearbox fault diagnosis, this article introduces a novel diagnostic approach that leverages adaptive multi-sensor data-level fusion along with fine-grained domain adaptation. Figure 1 shows the framework of the suggested approach, comprising three principal components:

(1) Multi-sensor data-level fusion module: This module employs an eccentricity-based preprocessing technique to detect and eliminate outliers within each sensor’s data. Subsequently, an adaptive weighting strategy guided by the principle of minimizing the total mean squared error is adopted to fuse the denoised signals at the data level. In addition, by integrating the robust preprocessed single-sensor data with the fused multi-sensor data, a multi-sensor, multi-level dataset is constructed, thereby enhancing the completeness of fault-related information. This design effectively reduces the risk of overfitting caused by noise or redundant inputs, thus addressing the challenge of insufficient robustness.

(2) Adversarial domain adaptation module: This module constructs an adversarial training framework comprising a domain discriminator and a target feature extractor to learn domain-invariant representations. During training, the feature extractor is optimized to generate features that confuse the domain discriminator, thereby aligning the marginal feature distributions between the source and target domains. By jointly minimizing the classification loss and the domain discrimination loss, the model ultimately learns domain-agnostic features, achieving an initial alignment of feature distributions across domains.

(3) Category-conditional alignment module: In this module, a teacher network constructed based on the Exponential Moving Average (EMA) of the target model’s parameters is employed to generate pseudo-labels for unlabeled samples in the target domain. Only class predictions with probabilities exceeding a predefined threshold are retained as pseudo-labels to ensure their reliability. The target model is subsequently trained by minimizing the cross-entropy loss between its predictions and the generated pseudo-labels, thereby aligning the class-conditional feature distributions between the source and target domains. Finally, the target model is optimized using a joint loss composed of the adversarial domain discrimination loss and a weighted class-conditional alignment loss. Building upon the marginal feature alignment achieved in the previous stage, this design enables fine-grained, category-level alignment across domains and enhances the model’s ability to adapt to subtle inter-class distributional discrepancies.

Figure 1.

Architecture of the suggested approach.

Multi-sensor data level fusion module

The multi-sensor data-level fusion module presented in this study consists of two parts: single sensor data preprocessing and multi-sensor data fusion, as illustrated in Figure 2. It intends to reduce the redundancy and noise interference of measurement data thereby enhancing the precision and dependability of signal sequences.

Figure 2.

Multi-sensor data level fusion module.

Single sensor data preprocessing

In the process of using multiple sensors to monitor vibration information on key components of a gearbox, sensor failure or environmental interference can cause significant deviations in the data. If abnormal data is not identified and eliminated in advance, noise will be amplified during the data fusion process, the data’s signal-to-noise ratio and amplifying the fused data’s error. To this end, this study uses the center deviation analysis method to conduct data preprocessing for the measurement data of each sensor. In this method, if a certain observation value’s deviation distance surpasses the predefined threshold, it is considered that it may have an adverse impact on the accuracy of data fusion and should be excluded. Assume that m sensors are deployed to simultaneously track the gearbox’s operational status, and the observation sequence collected by each sensor contains n data points. Then the eccentricity distance for a single sensor’s observation is specified as follows:

d_{i} = | x_{i} - \bar{x} |

(1)

where, $\bar{x}$ denotes the average of all observations from the pth sensor, $x_{i}$ is the ith observation.

The eccentricity is calculated for the observation value of each sensor. After removing the first c outliers in descending order according to the set threshold, the reliable data points are obtained as follows:

x_{i}^{'} = {\begin{matrix} x_{i}, & d_{i} \leq τ \\ 0, & d_{i} > τ \end{matrix}

(2)

where, $x_{i}^{'}$ represents the observation value remaining after the elimination of c observations, and $τ$ denotes the set eccentricity rejection threshold.

For the other n−c data points from the p-th sensor, single sensor weighted preprocessing is performed. The closer the observation is to the center, the stronger its coordination with other data points is, and the corresponding weight coefficient ought also to be increased. Based on this rule, the weight coefficient $w (i)$ for the ith observation is:

w (i) = \frac{1}{[d (i) + 1] \sum_{i = 1}^{n - c} \frac{1}{[d (i) + 1]}}

(3)

where, $w (i)$ spans from 0 to 1, and $\sum_{i = 1}^{n - c} w (i) = 1$ . To simplify the calculation of weighted factors when the eccentricity is zero, this study introduces the d(i)+1 strategy. The remaining n−c data points are calculated based on the preprocessing of the eccentricity as follows:

{\hat{x}}_{p} = \sum_{i = 1}^{n - c} w (i) x_{i}^{'}

(4)

where, ${\hat{x}}_{p}$ is the signal sequence after single sensor weighted preprocessing of the pth sensor. By adopting the same method, the signal sequences after single sensor weighted preprocessing of the remaining m−1 sensors can be calculated accordingly.

Multi-sensor data fusion module

On the basis of single sensor data preprocessing, multi-sensor data fusion aims to harness the complementary nature of information from various sources and enhance data stability. To this end, this study adopts an adaptive weighted data fusion strategy with minimizing the MSE as the optimization goal. Based on the signal series from various sensors following single sensor preprocessing, the optimal weight coefficients of multiple sensors are automatically searched, and finally integrated with the single sensor fusion data to form a multi-sensor multi-scale data set.

Assume that m independent sensor observation sequences are $M_{1}$ , $M_{2}$ , …, $M_{m}$ , and the corresponding variances are $σ_{1}^{2}$ , $σ_{2}^{2}$ , …, $σ_{m}^{2}$ , and each sensor sequence is integrated using a weighted approach based on individual sensor data. The formula for multi-sensor data fusion is as follows:

x^{*} = \sum_{p = 1}^{m} W_{p} {\hat{x}}_{p}

(5)

where, $W_{p}$ is the weighting factor for the pth sensor’s in the multi-sensor data amalgamation, and $x^{*}$ represent the multi-sensor fusion data at that moment. According to the literature,¹⁴ the weight coefficient corresponding to the lowest total MSE is:

W_{p}^{*} = 1 / ({σ_{p}}^{2} \sum_{i = 1}^{m} \frac{1}{{σ_{i}}^{2}}) p = 1, 2, \dots, m

(6)

where, $W_{p}^{*}$ is the optimal weight coefficient of the pth sensor. The lowest total MSE at this point is:

σ_{min}^{2} = 1 / (\sum_{k = 1}^{m} \frac{1}{σ_{p}^{2}})

(7)

Adversarial domain adaptation module

The adversarial domain adaptation module¹⁵ employs a domain adversarial training mechanism to learn domain-independent feature representations. This module comprises three components: the domain discriminator D, the feature extractor E, and the classifier C. E is tasked with extracting common features from the input data $f = E (x)$ , while the classifier C predicts the label $c = C (E)$ based on these extracted features. The binary domain discriminator D functions to identify domain-specific information $d = D (E)$ within the feature representation, thereby facilitating adversarial training of E. Amid the pre-training phase in the source domain, the classifier loss L_cls is determined by employing the cross-entropy loss function. The parameters of $E$ and $C$ are optimized by reducing the source domain classification loss L_cls to ensure the capability of the source domain classifier. It is defined as:

min_{E_{S}} L_{cls} = - E_{X_{S} ~ P_{S}} [y_{S}^{T} \log (C_{S} (E_{S} (X_{S})))]

(8)

where, $C_{S} (E_{S} (X_{S}))$ denotes the probability accurate categorization for the ith source-domain sample.

In order to guarantee the feature extractor and classifier of the source domain also maintain excellent performance on the target domain data, a domain distinguisher is introduced to identify the source of data features and optimized through adversarial training strategy, which can be calculated as follows:

\begin{matrix} min_{D} L_{D} = - E_{X_{S} ~ P_{S}} [\log (D (E_{S} (X_{S})))] \\ - E_{X_{T} ~ P_{T}} [\log (1 - D (E_{T} (X_{T})))] \end{matrix}

(9)

where, $D (*)$ maps to values between 0 and 1, signifying the likelihood of features being derived from the source or target domain. In the process of adversarial learning, the output features of the target feature extractor are optimized so that it is difficult for the discriminator to identify the origin of these features, determining if they originate from the source or target domain. The core goal of this process is to learn a domain-invariant feature representation. The loss of the target feature extractor can be calculated as follows:

min_{E_{T}} L_{dt} = E_{X_{T} ~ P_{T}} [\log (1 - D (E_{T} (X_{T})))]

(10)

Category conditional alignment module

Although the adversarial domain adaptation module can facilitate the alignment of feature edge distributions between the source and target domains, the method still cannot accurately match the inter-category distributions due to the offset in category conditions. To address this issue, a teacher model-based credible pseudo-labeling method is proposed to better accommodate fine-grained distributional discrepancies between source and target domains across different categories. This method constructs a stable teacher model via an EMA mechanism and employs a high-confidence thresholding strategy to filter out unreliable predictions, retaining only trustworthy pseudo-labels for guiding the training of the target model.

(1) Teacher model: Throughout the training and model refinement of the target domain’s feature extractor and classifier, the teacher model $f_{δ}$ is continuously updated at each training step by applying an exponential moving average (EMA)¹⁶ to the target network’s parameter weights $H_{θ_{T}}$ , which produces pseudo labels for the target domain’s unlabeled data, and perform semi-supervised learning to improve the model’s alignment with target domain category conditions. The teacher model’s weights can be adjusted using the subsequent formula:

H_{δ} = α H_{δ} + (1 - α) H_{θ_{T}}

(11)

where, $α$ represents a momentum coefficient that regulates the rate of weight adjustments in the teacher model, and $H_{δ}$ denote the weights of the teacher model. During initialization, the teacher model is directly initialized with the parameters of the source-domain pre-trained model to ensure a baseline classification capability and to provide stable pseudo-label supervision in the early stages of training. The teacher model’s predictive output can be determined using the following calculation:

p_{δ} = f_{δ} (X_{S})

(12)

{\hat{y}}_{δ} = softmax (p_{δ})

(13)

where, $p_{δ}$ represent the feature output of the teacher model, and ${\hat{y}}_{δ}$ represent the class probability distribution obtained after feature outputs are normalized by the Softmax function.

(2) Confident pseudo labels: To enhance the precision of pseudo-labels, this study only considers those labels whose probability values exceed a certain threshold in the prediction results of the teacher model,¹⁷ as shown in Figure 3. Specifically, labels with predicted probabilities higher than a predefined confidence threshold are retained as pseudo-labels and incorporated into the subsequent training process. In contrast, predictions that fall below this threshold are discarded to ensure the reliability of the pseudo-label supervision, thereby improving training efficiency.

This method screens out labels which predicted probabilities are higher than the preset threshold and includes them as pseudo-labels in the subsequent analysis and learning process. For those labels whose predicted probabilities do not reach the threshold, this study chooses not to consider them, to guarantee the dependability of the pseudo-labels used, thereby enhancing the learning efficiency of the model. It is given by the equation:

{\hat{y}}_{cp} = {\hat{y}}_{δ} [max (p_{δ}) > ξ]

(14)

where, ${\hat{y}}_{cp}$ represent the retained high-confidence pseudo-labels, and $ξ$ denotes the screening threshold of pseudo labels.

Figure 3.

Alignment of class-conditional distributions guided by a teacher model.

Notably, pseudo-labels are neither retained nor reused across training steps or epochs. Instead, at each training step, the latest teacher model regenerates pseudo-labels for all target-domain samples, thereby ensuring real-time updates. Pseudo-labels generated in preceding steps are immediately discarded. This mechanism guarantees that each iteration utilizes pseudo-labels derived from the most up-to-date teacher model, dynamically reflecting the ongoing refinement of the model’s representational capacity.

This paper uses the confidently assigned pseudo-labels to refine the target model and coordinates the distribution of specific categories by calculating the cross-entropy loss between the pseudo-labels and the predicted labels of the target domain model. The cross-entropy loss between the two can be defined as:

L_{ca} = - E_{X_{T} ~ P_{T}} [\sum_{k = 1}^{K} 1_{[y_{ps} = k]} \log ({\hat{y}}_{T}^{k})]

(15)

where, $L_{ca}$ represents the loss function that aligns the distributions across different classes, and ${\hat{y}}_{T} = C_{T} (E_{T} (X_{T}))$ denote the labels predicted by the target model.

Overall objective function

For AMDFFDA, the optimization process of the target model is achieved by combining two key loss functions: domain discrimination loss and the weighted class-conditional alignment loss. The overall objective function is delineated by:

\begin{matrix} L_{overall} = L_{dt} + λ L_{ca} \\ = min_{E_{T}} E_{X_{T} ~ P_{T}} [\log (1 - D (E_{T} (X_{T}))) \\ - λ {\hat{y}}_{ps}^{T} \log (C_{T} (E_{T} (X_{T})))] \end{matrix}

(16)

where, $λ$ signifies the coefficient assigned to the loss associated with class-conditional.

Diagnostic procedure

The diagnostic procedure of the AMDFFDA method offered in this article is depicted in Figure 4. It encompasses four primary stages:

(1) Gearbox multi-sensor data acquisition: By pre-arranging multiple acceleration sensors at key positions of the gearbox, the gearbox operation data under different working conditions can be collected within a certain period of time. Typically, multi-sensor data for the source domain are collected under specific operating conditions are labeled, while multi-sensor data for different operating conditions, known as the target domain, are unmarked.

(2) Multi-sensor data level fusion: The eccentricity of all observations of a single sensor is calculated, and abnormal data with large eccentricity are eliminated in proportion. Then, according to the consistency test principle, the weighting factors of the remaining observations are obtained and the weighted fusion of the single sensor is realized. On this basis, this paper employed an adaptive weighting fusion estimation approach aimed at reducing the overall MSE of the fused data, dynamically and adaptively find the optimal weighting factors corresponding to multiple sensors, thereby realize efficient fusion of data from various sensors. Integrate the above single sensor pre-processed data after consistency processing with the multi-sensor fusion data to form a multi-sensor multi-level dataset.

(3) Model training: The proposed AMDFFDA model is first subjected to supervised pretraining using the processed labeled source domain data, where the classification loss is calculated according to equation (8) to update the model parameters. Subsequently, both source and target domain data are fed into the model for adversarial training. A teacher–student framework is adopted, in which the target domain model is initialized with the pretrained source model parameters and updated independently, while the teacher model is dynamically updated via the EMA mechanism and is used to generate pseudo-labels for the target domain at each training step. The training process consists of two stages: in the first stage, the domain discrimination loss is computed using equation (9) to update the domain discriminator; in the second stage, high-confidence pseudo-labels generated by the teacher model are selected, and the total loss defined by equation (16) is calculated to optimize the target model. This process is iteratively performed over the joint training set of the source and target domains.

(4) Fault diagnosis: By inputting the test data from the unlabeled target domain into the trained model, the encoder model $E_{T}$ is tasked with extracting target adaptation features, which are then used by the target classifier $C_{T}$ to make category predictions.

Figure 4.

Diagnostic flowchart.

Results and discussion

Experimental dataset description

This paper employs the Tsinghua University gearbox multi-mode fault diagnosis dataset (MCC5) to confirm the efficacy of the suggested approach. The experimental platform used to collect this dataset is depicted in Figure 5. It comprises a 2.2 kW three-phase asynchronous motor, a two-stage parallel gearbox, a torque sensor, a magnetic powder brake serving as a torque generator, and a measurement and control system.¹⁸ Fault gears were introduced into the intermediate shaft of the gearbox using laser etching technology. Two triaxial vibration accelerometers (model TES001V, with a sensitivity of 100 mV/g) were positioned at the motor drive end and the bearing seat of the gearbox’s intermediate shaft. These sensors monitor the triaxial vibration signals of the associated components under various operating conditions with a sampling rate of 12.8 kHz.

Figure 5.

Data acquisition test bench.

With the aim of deeply investigate the fault properties of a specific type of gearbox under constant working conditions, this paper covers six health states of the gearbox: gear pitting, wear, missing teeth, broken teeth, cracks, and health states. These states are represented by codes from 0 to 5, and the concrete classification is presented in Table 1. In the experiment, 20 s of vibration data were collected under different health states and working conditions through six synchronously installed accelerometers, totaling 256,000 sampling points. In order to effectively extract features from these vibration signals, this paper uses a sliding window technique, setting the time window length to 1024 sampling points and the sliding step to 512 sampling points to extract samples from the original data. According to this method, 500 samples can be obtained for each health state. Furthermore, to confirm the efficacy and generalization of the model, the collected data were split into a training set and a test set with a proportion of 4:1 to guarantee the rationality of data use and the reliability of experimental outcomes.

Table 1.

Gearbox fault type information.

Fault type	Sample length	Training sample:Test sample	Class label
Health	6 × 1024	400:100	0
Gear pitting	6 × 1024	400:100	1
Gear wear	6 × 1024	400:100	2
Miss teeth	6 × 1024	400:100	3
Teeth break	6 × 1024	400:100	4
Teeth crack	6 × 1024	400:100	5

Comparison methods and implementation details

Task definition

For each health state, three different operating conditions are designed for the experiment (denoted as A, B, and C), namely, operating at a speed of 3000 rpm under a constant load of 10 N m, and running at speeds of 2000 and 3000 rpm under a constant load of 20 N m, as shown in Table 2. This paper mainly focuses on the following six tasks, namely A→B, A→C, B→A, B→C, C→A, and C→B, which are defined as tasks T0 to T5 respectively.

Table 2.

Gearbox operating information.

Condition	Rotational speed (rpm)	Load torque (N m)
A	3000	10
B	2000	20
C	3000	20

Parameter settings

In all experiments of this study, the standard back-propagation algorithm based on the Adam optimizer was employed to optimize the network, and the risk of model overfitting was reduced by setting the learning rate weight decay coefficient to 0.0001. All models were trained with a consistent number of iterations, 120. The domain discriminator and feature extractor were both assigned a learning rate of 0.001, while the batch size was configured to 32. The domain discriminator adopted in this study is composed of three fully connected layers with intermediate batch normalization and ReLU activations, specifically configured as a sequence of fully connected layers with output dimensions of 256, 128, 64, and 1, respectively. In order to guarantee impartiality and repeatability of the experimental outcomes, every model was trained and tested five times independently under different random seed conditions to attain a more reliable assessment of the model performance. Importantly, all comparative methods adopted the convolutional neural network architecture defined in Table 3 as a unified backbone for fair comparison. The number of input channels was adjusted depending on whether the method incorporated the multi-sensor data level fusion module, with seven channels used when fusion is applied and six otherwise. For the class alignment module, the trade-off parameter was set to 0.0005, the confidence threshold for pseudo-label screening was 0.85, and the teacher model’s momentum parameter was configured to 0.9.

Table 3.

CNN backbone architecture.

Operator	Kernel/stride	Outputs	Activation
Input	—	(7,1024)
Convolution	7 × 1 × 3/1	(32,1024)	Relu
BN	—	(32,1024)	—
Pooling	1 × 2/2	(32,512)	—
Convolution	32 × 1 × 3/1	(64,512)	Relu
BN	—	(64,512)	—
Pooling	1 × 2/2	(64,256)	—
Convolution	64 × 1 × 3/1	(128,256)	Relu
BN	—	(128,256)	—
Pooling	1 × 2/2	(128,128)	—
Flatten	—	(,131072)	—
FC	—	(,256)	Relu
FC	—	(,128)	Relu
FC	—	(,64)	Relu
Classifier	—	(,6)	Softmax

Note: Output dimensions are formatted as (Channels, Sequence length). Batch size is omitted for clarity.

Comparison method description

To demonstrate the benefits of our approach, four other approaches are used to conduct comparative analysis on the same dataset with the proposed method. These methods include:

(1) Convolutional Neural Network (CNN): This baseline model is trained using the labeled source domain data and directly evaluated on the unlabeled target domain data, without incorporating the domain adversarial training or the category conditional alignment modules.

(2) Deep Adaptation Network (DAN)¹⁹: Being a classic approach in the field of cross-domain unsupervised learning, DAN improves the domain-agnostic feature transferability within deep neural networks’ task-specific layers by minimizing disparities between domains, and uses MK-MMD to match the average embeddings of different domain distributions.

(3) Domain Adversarial Neural Network (DANN)¹⁵: As a typical cross-domain transfer strategy that integrates the concept of Generative Adversarial Networks (GANs), the core of this network architecture is to embed a special domain discriminator to realize the adversarial training process, and ultimately promote the feature extractor to learn domain-independent feature representation. It also successfully mitigates the problem of gradients that vanish or blow up, which can occur within adversarial training, by incorporating a gradient reversal layer. In this study, the adversarial learning structure employed in DANN is kept consistent with that used in the proposed method.

(4) MFET²⁰: This method proposes a lightweight fault diagnosis framework that integrates multi-sensor data fusion with a modified EfficientNetV2 backbone and transfer learning. The fused signals are input into the improved EfficientNetV2 for feature extraction, followed by domain adaptation to handle unseen target domains. In this study, to ensure a fair comparison, we replace its original backbone with the same CNN architecture defined in Table 3 and adopt the same dataset configuration.

Experimental results analysis and discussion

Analysis of multi-sensor data-level fusion results

According to the approach described in Section “Multi-sensor data level fusion module,” the outliers with eccentricity in the top 10% are excluded. Figure 6 shows the raw vibration signal’s waveforms in the time domain, single sensor fusion signal and multi-sensor fusion signal of the gearbox synchronously monitored by six groups of vibration sensors during the gear pitting state at a load of 10 N m, a speed of 1000 rpm.

Figure 6.

Time-domain graphs of multi-sensor data: (a) original data, (b) preprocessing data, and (c) multi-sensor fusion data.

As observed in the figure, after weighted fusion processing of single sensor data by applying a consistency check algorithm based on eccentricity, compared with the original data signal, the fused signal shows higher uniformity in amplitude distribution. This shows that through consistency preprocessing, random fluctuations in the signal are effectively removed, the impact of redundancy and outliers on the signal is significantly decreased, and the consistency of the data is markedly enhanced. Furthermore, through the multi-sensor data fusion technology rooted in minimum error, the monitoring results of multiple sensors are integrated, while effectively isolating non-fault related information such as noise, while successfully retaining the signal components closely related to the fault characteristics. This method not only enhances the reliability of the signal, but also more comprehensively captures the variations in the gear’s operational condition through multi-sensor data-level adaptive weighted fusion strategy, thereby providing more comprehensive data support for the health status assessment and fault diagnosis of the gearbox.

Diagnosis results analysis and discussion

The average accuracy and variance of five different random seeds for the approach introduced in this paper and the comparison method are shown in Table 4. The best-performing accuracy for each cross-domain scenario is highlighted in bold, with the second-best underscored. Overall, AMDFFDA achieves the highest average accuracy of 0.75, significantly outperforming CNN (0.37), DAN (0.45), DANN (0.44), and MFET (0.58), demonstrating its superior cross-domain diagnostic capability. As a baseline, CNN lacks both data preprocessing and domain adaptation mechanisms, resulting in consistently low performance. Although DAN introduces MK-MMD to reduce domain discrepancies, it struggles to capture discriminative features when multi-sensor data contains redundancy and noise. Similarly, DANN employs adversarial training to encourage domain-invariant features, but its simple global alignment mechanism is insufficient to handle the complex noise patterns in multi-sensor environments. MFET shows better performance with an average accuracy of 0.58, owing to its use of sensor fusion and transfer learning within a lightweight framework. However, it still falls short of AMDFFDA, indicating that while basic integration strategies offer generalization benefits, they are inadequate for addressing fine-grained category shifts and heterogeneous distribution noise. These comparisons underscore the necessity of more sophisticated fusion and alignment strategies in cross-domain fault diagnosis using multi-sensor data.

Table 4.

Performance statistics of different cross-domain fault diagnosis methods.

Loads	CNN	DAN	DANN	MFET	AMDFFDA
A→B	0.21 ± 0.002	0.19 ± 0.001	0.13 ± 0.018	0.45 ± 0.002	0.62 ± 0.007
A→C	0.41 ± 0.001	0.59 ± 0.003	0.56 ± 0.007	0.59 ± 0.005	0.63 ± 0.007
B→A	0.19 ± 0.001	0.28 ± 0.007	0.23 ± 0.001	0.47 ± 0.003	0.62 ± 0.005
B→C	0.53 ± 0.011	0.59 ± 0.005	0.64 ± 0.016	0.69 ± 0.005	0.99 ± 0.004
C→A	0.37 ± 0.010	0.51 ± 0.001	0.47 ± 0.020	0.57 ± 0.010	0.65 ± 0.005
C→B	0.51 ± 0.005	0.58 ± 0.001	0.61 ± 0.004	0.72 ± 0.004	0.97 ± 0.003
Average	0.37 ± 0.003	0.45 ± 0.003	0.44 ± 0.006	0.58 ± 0.005	0.75 ± 0.005

To facilitate a clearer comparison across different methods, Figure 7 presents the confusion matrices of all approaches on the B→C task. CNN, DAN, and DANN exhibit relatively high accuracy in identifying Gear pitting and Teeth crack, but their performance on other categories often below 50%. This reflects their limited ability to distinguish subtle fault patterns in the presence of redundant or noisy sensor signals, resulting in blurred class boundaries and poor generalization. The MFET method benefits from sensor fusion and lightweight transfer learning, showing improved results in classes like Miss teeth and Gear pitting. However, it still suffers from noticeable misclassification in Health and Gear wear, indicating its lack of fine-grained domain alignment and adaptive feature filtering. In contrast, the proposed AMDFFDA method achieves consistently high accuracy across all six classes, with each category exceeding 96%. This highlights its robustness in suppressing noise, preserving complementary fault information, and aligning domain features at both global and class levels—offering superior generalization in complex transfer scenarios.

Figure 7.

Confusion matrix of the results of each model in task B→C.

Aiming to more clearly evaluate the performance of different approaches in terms of feature extraction capabilities, all test samples from the target domain are introduced to the feature extraction network, which then captures the output features from the final pooling layer. Subsequently, these high-dimensional features are mapped into a reduced-dimensional space via t-distributed stochastic neighbor embedding (t-SNE) technique.²¹ for visualization analysis. Figure 8 presents the t-SNE visualizations of the target domain features for task B→C. As shown, CNN lacks domain adaptation and exhibits loose clusters with considerable overlap, especially between T-0 and T-4, indicating poor feature discriminability. DAN improves aggregation for some classes such as T-0 and T-4 due to MK-MMD-based global alignment, but still suffers from blurred class boundaries as it neglects category-conditional alignment. DANN achieves slightly better clustering, particularly for T-4 and T-5, yet overlaps remain in T-0 and T-3 due to its limited fine-grained alignment. MFET achieves improved clustering over earlier baselines, with clearer separation between T-4 and T-5, but T-0 and T-2 still exhibit noticeable overlap, suggesting that its sensor fusion and transfer learning integration is insufficient for fine-grained alignment. In contrast, AMDFFDA shows a high degree of intra-class aggregation even for T-0, T-3, and T-4 categories that are misclassified by other methods. The boundaries of each category are clear, there is no obvious overlapping area, and it presents the optimal feature distribution structure. The distribution of each category shows a clear hierarchical relationship, indicating that the model can better extract target domain features.

Figure 8.

Feature visualization of the MCC5 dataset on the B→C task.

Ablation study

To demonstrate the impact of each part within our offered approach, this paper presents an ablation study conducted on the MCC5 dataset, defining the model variants as detailed below.

(1) AMDFFDA (w/o AMD): It eliminated the multi-sensor data fusion module and only used the original source and target domain data as the input of the model.

(2) AMDFFDA (w/o DAT): This method removed the domain adversarial module, inputs the data processed by multi-sensor fusion into the model, leveraged the teacher model to generate pseudo-labels for the target domain’s unmarked data.

(3) AMDFFDA (w/o Teacher): This method bypasses the class alignment module, instead utilizing the multi-sensor data fusion module to treat the data, and subsequently extracts domain-invariant features with the domain adversarial module.

Figure 9 presents the average classification accuracy across six cross-domain settings for each ablated variant. Notably, removing the AMD module causes the most severe degradation, with accuracy plunging to 19.38%—a drop of over 50%—which clearly underscores its foundational role. The AMD module not only suppresses noise and outliers via eccentricity-based preprocessing but also integrates complementary fault information to enhance feature completeness. By suppressing redundancy and preserving fault-relevant information, it improves the robustness of feature representations and strengthens the model’s adaptability to varying operating conditions, thereby indirectly supporting the effectiveness of the remaining modules. In contrast, removing the domain adversarial training (DAT) module leads to a moderate performance decline, revealing its significance in aligning global feature distributions between domains. Although adversarial learning helps enforce domain invariance, the relatively smaller impact of its removal suggests that the fused features and high-confidence pseudo-labels already retain substantial discriminative power for the target domain. Similarly, the exclusion of the category alignment module also degrades model performance. This is because DAT primarily reduces global domain shifts but does not guarantee intra-class alignment. The teacher model, through dynamic pseudo-labeling, provides soft supervision that refines category boundaries and avoids semantic overlaps in the feature space. The comparatively smaller performance drop implies that while fine-grained guidance enhances precision, its effectiveness depends on the representational quality established by AMD and DAT.

Figure 9.

Comparison of ablation experiment accuracy.

Hyperparameter analysis

The proposed method encompasses several pivotal parameters that could substantially influence the model’s performance. These include the momentum parameter $α$ when updating the teacher model in equation (11) and the class alignment loss weight $λ$ in equation (16). This paper gradually increases $λ$ from 0.00009 to 0.9 and $α$ from 0.0009 to 0.9, and examines their average accuracy in six cross-domain scenarios in the MCC5 dataset, and the heat map accuracy analysis of the changes of the two is shown in Figure 10. Obviously, gradually reducing $λ$ and slowly increasing $α$ can improve the performance of AMDFFDA. However, when it continues to decrease around 0.001, the weight of the class alignment condition module will be too small, making it impossible to achieve distribution alignment of pseudo labels at a fine granularity, thereby reducing the accuracy of the model. For the teacher model momentum update parameter $α$ , the best performance is around 0.9.

Figure 10.

Hyperparameter variation accuracy heatmap.

One of the key parameters is $ξ$ in equation (14), which represents the screening threshold for high-confidence pseudo-labels. This paper conducts experimental analysis based on the MCC5 dataset, increasing $ξ$ from 0.7 to 0.95, and Figure 11 reports the mean accuracy of six cross-domain settings. Obviously, a lower confidence threshold will reduce the generalization performance across domains Due to the utilization of noisy pseudo-labels in training the target model. Conversely, employing a higher confidence threshold consistently yields improved performance across various cross-domain settings. Nonetheless, setting an exceedingly high confidence threshold, such as 0.95, can diminish the model’s performance in cross-domain contexts, as it might struggle to identify an adequate quantity of pseudo-labels that satisfy such a stringent criterion.

Figure 11.

Pseudo-label confidence sensitivity analysis.

Conclusions

This study presents a cross-domain gearbox fault diagnosis framework that integrates adaptive multi-sensor data-level fusion, domain adversarial training, and category-conditional alignment based on dynamic pseudo-labeling. Experimental results on the MCC5 dataset validate the framework’s effectiveness in mitigating noise sensitivity, preserving complementary fault information, and addressing fine-grained category shifts under unlabeled conditions. Despite its demonstrated performance, the approach remains limited by the assumption of fully labeled source domain data. Future work will explore semi-supervised strategies to extend its applicability to label-scarce industrial scenarios.

Footnotes

Handling Editor: Divyam Semwal

ORCID iD

Jian Zheng

Author contributions

Yabin Shi: Resources, Validation, Investigation, Funding acquisition, Jian Zheng: Writing—original draft, Software, Validation. Gaige Chen: Conceptualization, Writing—original draft, Software, Methodology. Lin Li: Supervision, Project administration, Funding acquisition. Xiaozheng Jin: Supervision, Data curation. Zhongquan Li: Data analysis, Visualization, Writing—review & editing.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the Project of Key Industrial Chain Technical Research of Xi’an (23ZDCYJSG G0025-2023).

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Liu

Cui

Cheng

WD.

A review on deep learning in planetary gearbox health state recognition: methods, applications, and dataset publication. Meas Sci Technol 2024; 35(1): 012002.

Han

Feng

ZP.

Deep subclass alignment transfer network based on time-frequency features for intelligent fault diagnosis of planetary gearboxes under time-varying speeds. Meas Sci Technol 2022; 33(10): 105010.

Yang

Gao

Zhang

, et al. A multi-sensor fault diagnosis method for rotating machinery based on improved fuzzy support fusion and self-normalized spatio-temporal network. Meas Sci Technol 2023; 34(12): 125112.

Guo

Zhen

, et al. Multiscale cyclic frequency demodulation-based feature fusion framework for multi-sensor driven gearbox intelligent fault detection. Knowl Based Syst 2024; 283: 111203.

Wan

Liu

, et al. Bearing fault diagnosis method based on attention mechanism and multilayer fusion network. ISA Trans 2022; 128: 550–564.

Shi

Ren

, et al. Fault diagnosis in a hydraulic directional valve using a two-stage multi-sensor information fusion. Measurement 2021; 179: 109460.

Liu

Yang

Multi-sensor GA-BP algorithm based gearbox fault diagnosis. Appl Sci 2022; 12(6): 3106.

Zhuang

Duan

, et al. A comprehensive survey on transfer learning. Proc IEEE 2021; 109(1): 43–76.

Liu

Yang

Xie

, et al. Multi-sensor cross-domain fault diagnosis method for leakage of ship pipeline valves. Ocean Eng 2024; 299: 117211.

10.

Jiang

Wang

, et al. Multi-sensor data fusion-enabled semi-supervised optimal temperature-guided PCL framework for machinery fault diagnosis. Inf Fusion 2024; 101: 102005.

11.

Zhang

Ren

, et al. Multi-sensor open-set cross-domain intelligent diagnostics for rotating machinery under variable operating conditions. Mech Syst Signal Process 2023; 191: 110172.

12.

Warke

Kumar

Bongale

, et al. Robust tool wear prediction using multi-sensor fusion and time-domain features for the milling process using instance-based domain adaptation. Knowl Based Syst 2024; 288: 111454.

13.

Lin

Kong

Han

, et al. An information fusion-based meta transfer learning method for few-shot fault diagnosis under varying operating conditions. Mech Syst Signal Process 2024; 220: 111652.

14.

Chen

SJ.

A multi-sensor data fusion algorithm based on consistency preprocessing and adaptive weighting. Automatika 2024; 65(1): 82–91.

15.

Han

Liu

Yang

, et al. A novel adversarial learning framework in deep convolutional neural network for intelligent diagnosis of mechanical faults. Knowl Based Syst 2019; 165: 474–487.

16.

Tarvainen

Valpola

Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst 2017; 30: 1195–1204.

17.

Ragab

Eldele

Chen

, et al. Self-supervised autoregressive domain adaptation for time series data. IEEE Trans Neural Netw Learn Syst 2024; 35(1): 1341–1351.

18.

Chen

Liu

, et al. Multi-mode fault diagnosis datasets of gearbox under variable working conditions. Data Brief 2024; 54: 110453.

19.

Long

Cao

Wang

, et al. Learning transferable features with deep adaptation networks. Proc Mach Learn Res 2015; 37: 97–105.

20.

Jiang

Zhu

Sun

An improved lightweight variant of EfficientNetV2 coupled with sensor fusion and transfer learning techniques for motor fault diagnosis. IEEE Access 2024; 12: 84470–84487.

21.

Van Der Maaten

Hinton

. Visualizing data using t-SNE. J Mach Learn Res 2008; 9(11): 2579–2605.