A teacher–student strategy specific to transformer for machine fault diagnosis

Abstract

To address the challenges of limited labeled data and computational resources in intelligent machine fault diagnosis, we propose a teacher–student strategy based on transformers with token distillation. This approach introduces a learnable embedding in the attention mechanism, enabling the student network to learn diagnostic features from a larger teacher network. This is especially beneficial when the student model is applied to new operating conditions via a joint classifier. Building on the Vision Transformer architecture, known for its success with large-scale image datasets, our method starts by converting signals into image samples. Soft distillation using attention mechanisms then facilitates the Vision Transformer’s training with limited data. In addition, pretraining on a comprehensive mechanical dataset with diverse labeled fault types improves the model’s performance on specific target datasets, allowing it to generalize to new and unseen faults. This strategy demonstrates strong performance in Top-1 accuracy, mean precision, mean recall, and mean F1 score across datasets involving bearings, gears, and rotors, improving diagnostic accuracy even when labeled fault data are scarce.

Keywords

Intelligent fault diagnosis parameter transfer learning teacher–student network network compression knowledge distillation

Get full access to this article

View all access options for this article.

References

Liu

Yang

Zio

Artificial intelligence for fault diagnosis of rotating machinery: a review. Mech Syst Signal Process 2018; 108: 33–47.

Niu

Yang

Pecht

Development of an optimized condition-based maintenance system by data fusion and reliability-centered maintenance. Reliab Eng Syst Safety 2018; 95(7): 786–796.

Ding

Yuan

Xiong

, et al. Gear fault diagnosis under variable operating conditions using domain adaptation. Measurement 2021; 176: 109179.

Wang

Makis

Yang

A partially observed Markov decision process for optimal condition-based maintenance of a slowly deteriorating system subject to vibration monitoring. Eur J Operat Res 2017; 190(2): 331–339.

Zhang

Liang

Weak fault detection of rotating machinery using an enhanced sparse convolution guided stochastic resonance. Mech Syst Signal Process 2020; 140: 106670.

Randall

RB.

A history of cepstrum analysis and its application to mechanical problems. Mech Syst Signal Process 2017; 97: 3–19.

Yang

Qiao

Zhu

, et al. An intelligent fault diagnosis method enhanced by noise injection for machinery. IEEE Trans Instrumen Meas 2023; 72: 3534011.

Kumar

Xiang

, et al. Digital twin-assisted AI framework based on domain adaptation for bearing defect diagnosis in the centrifugal pump. Measurement 2024; 235:115013.

Bao

Liang

, et al. A broad learning model guided by global and local receptive causal features for online incremental machinery fault diagnosis. Expert Syst Appl 2024; 246: 123124.

10.

Dai

Gao

From model, signal to knowledge: a data-driven perspective of fault detection and diagnosis. IEEE Trans Ind Inf 2013; 9(4): 2226–2238.

11.

Gao

Pan

, et al. An effective hybrid genetic algorithm and variable neighborhood search for integrated process planning and scheduling in a packaging machine workshop. IEEE Trans Syst Man Cybern 2018; 49(10): 1933–1945.

12.

Choi

D-J

Han

J-H

Park

S-U

, et al. Comparative study of CNN and RNN for motor fault diagnosis using deep learning. 2020 IEEE 7th International Conference on Industrial Engineering and Applications (ICIEA) 2020, pp. 693–696. Doi: 10.1109/ICIEA49774.2020.9102072

13.

Eren

Turkey

Serkan

A generic intelligent bearing fault diagnosis system using compact adaptive 1D CNN classifier. J Sign Process Syst 2019; 91: 179–189.

14.

Qin

Shi

. Fault diagnosis method for rolling bearings based on two-channel CNN under unbalanced datasets.. Appli Sci 2022; 12(17): 8474.

15.

Wang

, et al. A novel bearing fault diagnosis method based on gaussian restricted Boltzmann machine. Math Probl Eng 2016; 2016(1): 2957083.

16.

Liu

Zhou

Zheng

, et al. Fault diagnosis of rolling bearings with recurrent neural network-based autoencoders. ISA Trans 2018; 77: 167–178.

17.

Zhao

Shi

Huang

, et al. A multiple conditions dual inputs attention network remaining useful life prediction method. Eng Appl Artificial Intell 2024; 133: 108160.

18.

Halkon

, et al. Physics-informed residual network (PIResNet) for rolling element bearing fault diagnostics. Mech Syst Signal Process 2023; 200: 110544.

19.

Zhou

Zhang

, et al. GDALR: global dual attention and local representations in transformer for surface defect detection. Measurement 2024; 229: 114398.

20.

Feng

Data-driven bearing health management using a novel multi-scale fused feature and gated recurrent unit. Reliab Eng Syst Saf 2024; 242: 109753.

21.

Qiao

Shu

Coupled neurons with multi-objective optimization benefit incipient fault identification of machinery. Chaos Solitons Fractals 2021; 145: 110813.

22.

Xiao

Shao

Wang

, et al. Bayesian variational transformer: a generalizable model for rotating machinery fault diagnosis. Mech Syst Signal Process 2024; 207: 110936.

23.

Wen

Gao

A new convolutional neural network-based data-driven fault diagnosis method. IEEE Trans Industrial Electron 2018; 65(7): 5990--5998.

24.

Peng

Jiang

Wang

, et al. Domain adaptation with conditional independence for intelligent fault diagnosis. IEEE Trans Instrumen Meas 2021; 70: 1–10.

25.

Wang

Zheng

, et al. Domain adaptation with conditional independence for intelligent fault diagnosis. Neurocomputing 2018; 312: 58–65.

26.

Wang

Zhao

, et al. A novel transfer learning method for machinery fault diagnosis under variable working conditions. ISA Trans 2021; 109: 91–104.

27.

Pan

Yang

A survey on transfer learning. IEEE Trans Knowledge Data Eng 2010; 22(10): 1345–1359.

28.

Weiss

Khoshgoftaar

Wang

A survey of transfer learning. J Big Data 2016; 3(1): 1–40.

29.

Huang

Pan

Lei

Transfer learning with deep convolutional neural network for SAR target classification with limited labeled data. Remote Sens 2017; 9(9): 907.

30.

Liu

Zhou

, et al. Unsupervised fault diagnosis of rolling bearings using a deep neural network based on generative adversarial networks. Neurocomputing 2018; 315: 412–424.

31.

Sun

Liu

Wen

, et al. Multiple hierarchical compression for deep neural network toward intelligent bearing fault diagnosis. Eng Appl Artif Intell 2022; 116: 105498.

32.

Geoffrey Hinton

Vinyals

Oriol

, et al. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.

33.

Peng

, et al. A neural network compression method based on knowledge-distillation and parameter quantization for the bearing fault diagnosis. Appl Soft Comput 2022; 127: 109331.

34.

Jin

Meng

, et al. An improved two-stream compression convolution network for rolling bearing fault diagnosis. Meas Sci Technol 2022; 33(12): 125110.

35.

Zhao

Shen

A federated distillation domain generalization framework for machinery fault diagnosis with data privacy. Eng Appl Artif Intell 2024; 130: 107765.

36.

Guo

Shen

A lightweight residual network based on improved knowledge transfer and quantized distillation for cross-domain fault diagnosis of rolling bearings. Expert Syst Appl 2024; 245: 123083.

37.

Dosovitskiy

Beyer

Kolesnikov

, et al. An image is worth 16x16 words: transformers for image recognition at scale, arXiv preprint arXiv:2010.11929, 2020.

38.

Molchanov

Tyree

Karras

, et al. Pruning convolutional neural networks for resource efficient inference, arXiv preprint arXiv:1611.06440, 2016.

39.

http://csegroups.case.edu/bearingdatacenter/pages/welcome-case-western-reserve-university-bearing-data-center-website

40.

https://github.com/cathysiyu/Mechanical-datasets

41.

http://www.52phm.cn/datasets/gearbox/Geabox-data-set-of-Southeast-University.html

42.

https://data.mendeley.com/datasets/p9bsmj4xwg/1