Abstract
To address the challenges of limited labeled data and computational resources in intelligent machine fault diagnosis, we propose a teacher–student strategy based on transformers with token distillation. This approach introduces a learnable embedding in the attention mechanism, enabling the student network to learn diagnostic features from a larger teacher network. This is especially beneficial when the student model is applied to new operating conditions via a joint classifier. Building on the Vision Transformer architecture, known for its success with large-scale image datasets, our method starts by converting signals into image samples. Soft distillation using attention mechanisms then facilitates the Vision Transformer’s training with limited data. In addition, pretraining on a comprehensive mechanical dataset with diverse labeled fault types improves the model’s performance on specific target datasets, allowing it to generalize to new and unseen faults. This strategy demonstrates strong performance in Top-1 accuracy, mean precision, mean recall, and mean F1 score across datasets involving bearings, gears, and rotors, improving diagnostic accuracy even when labeled fault data are scarce.
Keywords
Get full access to this article
View all access options for this article.
