Abstract
Existing fault diagnosis methods primarily focus on directly extracting fault-sensitive (highly discriminative) features from rotating machinery signals, while nonfault-sensitive (low-discriminative) features—typically dominated by background information—often prevail in the spectral background and exert a significant influence on overall signal representations. This dominance makes fault-related features difficult to be robustly extracted and modeled under multinoise and complex operating conditions. In addition, single-modality features are insufficient to comprehensively characterize the overall operating states of mechanical equipment, which further limits the robustness and cross-scenario generalization capability of fault diagnosis models. To address these challenges, this article proposes an LCR-Net that integrates regional bidirectional cross-attention (RBCA) and causal inference (CI). Specifically, the RBCA mechanism enables bidirectional cross-modal information interaction between acoustic and vibration signals at the level of low-discriminative features, facilitating their collaborative enhancement to serve as a stable background reference. Meanwhile, residual learning is employed to model residual features associated with fault evolution. Subsequently, a CI module is introduced to estimate causal weights across multimodal signals, thereby alleviating the instability of early fusion strategies under complex noise conditions and jointly achieving causality-driven residual feature weighted fusion and fault identification. To validate the effectiveness of the proposed method, an experimental dataset containing multiple noise types with different noise intensities is constructed. Experimental results demonstrate that LCR-Net achieves an average diagnostic accuracy of 99.68% under different train–test split ratios within only 30 training epochs, significantly outperforming diagnostic methods based on single-modality acoustic or vibration signals. Furthermore, in cross-device generalization experiments conducted on bearing and motor datasets, the proposed method attains average diagnostic accuracies of 99.3% and 98.2%, respectively. These results indicate that the proposed approach exhibits strong robustness and generalization capability for multimodal fault diagnosis under multinoise and cross-device scenarios.
Get full access to this article
View all access options for this article.
