Abstract
Bearing fault diagnosis is a critical issue in mechanical system fault diagnosis. To address the limitations of single-model approaches, a parallel feature fusion model combining 2D-SwinTransformer and 1D-CNN-SENet is proposed. The model integrates Fast Fourier Transform (FFT) with Variational Mode Decomposition (VMD) for multi-scale time-frequency feature extraction, overcoming traditional FFT’s inability to capture time-varying characteristics and VMD’s mode mixing and noise sensitivity issues. The 1D-CNN focuses on global feature extraction, while the SwinTransformer employs a window attention mechanism for deep local feature mining. The SENet channel attention mechanism enhances the 1D-CNN’s ability to capture key spatial information, and multiple window configurations in SwinTransformer strengthen local feature learning. The parallel feature fusion model fusing 2D-SwinTransformer and 1D-CNN-SENet proposed in this paper achieves 99.06% test accuracy on the Southeast University bearing dataset, which is an improvement of 8.54% over the traditional 1D-CNN (90.52%) and a single SwinTransformer (98.61%), respectively and 0.45%. As verified by the ablation experiments, the contribution of each component of the model to the performance improvement is in the following order: the channel attention mechanism (SENet) brings 7.56% accuracy gain, and the window attention mechanism of SwinTransformer improves the recall by 6.31%. The experimental results show that the model outperforms existing mainstream methods in key metrics such as F1 score (99.08%) and generalization ability (96.3% accuracy), providing a new solution for real-time fault diagnosis in industrial scenarios.
Get full access to this article
View all access options for this article.
