Abstract
Machine reliability is of great importance in modern industries, where timely, accurate, and explainable fault diagnosis based on vibration data is one of the key processes. Traditional contact vibration sensors face installation limits and signal distortion. The existing intelligent fault diagnosis methods are typically not explainable. In recent years, event-based cameras offer a promising noncontact solution due to their asynchronous recording and high speed. Meanwhile, large models are able to provide expert explanations in many fields. To address the aforementioned problems in fault diagnosis, we propose a physics-informed multimodal large model with dynamic vision in this paper. First, we build videos from dynamic vision data and use frequency-domain physical priors to guide fault reasoning. Meanwhile, a separate classification head is proposed to improve both fault diagnosis and explainable text generation effects. Experiments show the proposed method is not only as accurate as the contact vibration sensing methods but also provides reasonable expert explanations for machine fault diagnosis. With the high interpretability, flexibility, and accuracy, the proposed method is thus validated as a promising new tool for machine fault diagnosis.
Get full access to this article
View all access options for this article.
