Sage Journals: Discover world-class research

Abstract

Bearings are critical components in rotating machinery, and vibration-based fault diagnosis plays an important role in monitoring their operational conditions. To improve the accuracy and stability of bearing fault diagnosis at the signal representation level, this research work proposes a multimodal diagnostic method that fuses visual representations with complementary temporal features extracted from bearing vibration signals. Specifically, a Mamba-based state space model is employed to capture long-term temporal dependencies in vibration sequences, enabling improved modeling of slow-varying and long-range dynamic patterns. Meanwhile, an improved Gramian Angular Field (GAF) is introduced to map one-dimensional time-series signals into two-dimensional images, and EfficientNet is adopted as the visual feature extraction backbone. In addition, an LSTM module combined with a self-attention mechanism is integrated to model short-term temporal dynamics and facilitate effective interaction between temporal and visual representations. Furthermore, the IVY optimization algorithm is utilized to automatically tune key hyperparameters and enhance training stability. By jointly modeling long-term temporal features, short-term temporal dynamics, and image-based representations, the proposed method forms a collaborative and complementary feature representation for bearing vibration signals. Experimental results indicate that the proposed approach provides consistent performance improvements and favorable generalization behavior: ablation studies show that model accuracy increased from 93.31% to 99.62% as key modules were progressively incorporated, while comparative experiments on two public bearing datasets achieved F1 scores of 98.75% and 98.27%, demonstrating competitive performance relative to existing image-only and time-series-only baseline models.

Keywords

multi-modal learning time-series modeling state-space model fault diagnosis Gramian angular field

Get full access to this article

View all access options for this article.

References

Zhang

Fan

, et al. 2025. A survey on deep learning based time series analysis with frequency transformation. In: Proceedings of the 31st ACM SIGKDD conference on knowledge discovery and data mining V.2 (KDD '25), pp. 6206–6215. New York, NY: Association for Computing Machinery.

Ghaderpour

Pagiatakis

Hassan

QK.

A survey on change detection and time series analysis with applications. Appl Sci 2021; 11: 6141.

Zamanzadeh Darban

Webb

Pan

, et al. Deep learning for time series anomaly detection: a survey. ACM Comput Surv 2024; 57(1): 1–42.

Lei

Yang

Jiang

, et al. Applications of machine learning to machine fault diagnosis: a review and roadmap. Mech Syst Signal Process 2020; 138: 106587.

Verdonck

Baesens

Óskarsdóttir

, et al. Special issue on feature engineering editorial. Mach Learn 2024; 113: 3917–3928.

Szymczak

A review on longitudinal data analysis with random forest. Brief Bioinform 2023; 24(2): bbad002.

Gupta

Biswas

, et al. Approaches and applications of early classification of time series: a review. IEEE Trans Artif Intell 2020; 1(1): 47–61.

Zhao

Wang

Cui

Frequency-chirprate synchrosqueezing-based scaling chirplet transform for wind turbine nonstationary fault feature time–frequency representation. Mech Syst Signal Process 2024; 209: 111112.

Ismail Fawaz

Forestier

Weber

, et al. Deep learning for time series classification: a review. Data Min Knowl Discov 2019; 33: 917–963.

10.

Wibawa

Utama

ABP

Elmunsyah

, et al. Time-series analysis with smoothed convolutional neural network. J Big Data 2022; 9: 44.

11.

Rahimilarki

Gao

Jin

, et al. Convolutional neural network fault classification based on time-series analysis for benchmark wind turbine machine. Renew Energy 2022; 185: 916–931.

12.

Kim

Kang

Time-series anomaly detection with stacked transformer representations and 1D convolutional network. Eng Appl Artif Intell 2023; 120: 105964.

13.

Khan

Wang

Ngueilbaye

, et al. End-to-end multivariate time series classification via hybrid deep learning architectures. Pers Ubiquitous Comput 2023; 27: 177–191.

14.

Fan

Tang

Guo

, et al. BiLSTM-MLAM: a multi-scale time series prediction model for sensor data based on Bi-LSTM and local attention mechanisms. Sensors 2024; 24(12): 3962.

15.

Indrasiri

Kashyap

Pathirana

PN.

Image encoded time series classification of small datasets: an innovative architecture using deep learning ensembles. Int J Mach Learn Cybern 2025; 16: 6065–6080.

16.

Wang

Anomaly detection in sensor data via encoding time series into images. J King Saud Univ - Comput Inf Sci 2024; 36(10): 102232.

17.

Comar

Using time series encoders with image classifiers to perform modulation recognition. In: SoutheastCon 2024, Atlanta, GA, USA, 2024, pp. 87–94.

18.

Zhang

Cheng

, et al. Research of surface oxidation defects in copper alloy wire arc additive manufacturing based on time-frequency analysis and deep learning method. J Mater Res Technol 2023; 25: 511–521.

19.

Rodríguez

Azcárate

Vadillo

, et al. Forecasting intra-hour solar photovoltaic energy by assembling wavelet based time-frequency analysis with deep learning neural networks. Int J Electr Power Energy Syst 2022; 137: 107777.

20.

Kaya

Kuncan

Ertunç

HM.

A new automatic bearing fault size diagnosis using time-frequency images of CWT and deep transfer learning methods. Turk J Electr Eng Comput Sci 2022; 30(5): 1851–1867.

21.

Sánchez

R-V

Macancela

Ortega

L-R

, et al. Evaluation of hand-crafted feature extraction for fault diagnosis in rotating machinery: a survey. Sensors 2024; 24: 5400.

22.

Shamsipour

Fekri-Ershad

Sharifi

, et al. Improve the efficiency of handcrafted features in image retrieval by adding selected feature generating layers of deep convolutional neural networks. Signal Image Video Process 2024; 18: 2607–2620.

23.

Rahmani

Ghazizadeh

Spectrum monitoring based on end-to-end learning by deep learning. Int J Wirel Inf Netw 2022; 29: 180–192.

24.

Xiao

, et al. The bearing multi-sensor fault diagnosis method based on a multi-branch parallel perception network and feature fusion strategy. Reliab Eng Syst Saf 2025; 261: 111122.

25.

Zhi

, et al. An unsupervised transfer learning bearing fault diagnosis method based on multi-channel calibrated Transformer with shiftable window. Structural Health Monitor 2025; 34: 14759217251324671.

26.

Zhi

Niu

, et al. Local entropy selection scaling-extracting chirplet transform for enhanced time-frequency analysis and precise state estimation in reliability-focused fault diagnosis of non-stationary signals. Eksploatacja i Niezawodność-Maintenance and Reliability, 2026; 28(1): 205977.

27.

Liao

Shang

, et al. An empirical study of the impact of hyperparameter tuning and model optimization on the performance properties of deep neural networks. ACM Trans Softw Eng Methodol 2022; 31(3): 1–40.

28.

Karl

Pielok

Moosbauer

, et al. Multi-objective hyperparameter optimization in machine learning—an overview. ACM Trans Evol Learn Optim 2023; 3(4): 1–50.

29.

Jiang

Ning

Pan

, et al. Multi-modal time series analysis: a tutorial and survey. In: Proceedings of the 31st ACM SIGKDD conference on knowledge discovery and data mining V.2 (KDD '25), 2025, pp. 6043–6053. New York, NY: Association for Computing Machinery.

30.

Zhao

Zhang

Geng

Deep multimodal data fusion. ACM Comput Surv 2024; 56(9): 1–36.

31.

Hussain

O’Nils

Lundgren

, et al. A comprehensive review on deep learning-based data fusion. IEEE Access 2024; 12: 180093–180124.

32.

Xiong

Zhu

, et al. Fault diagnosis method based on encoding time series and convolutional neural network. IEEE Access 2020; 8: 165232–165246.

33.

Wang

Oates

Imaging time-series to improve classification and imputation. In: Proceedings of the 24th international conference on artificial intelligence (IJCAI'15), 2015, pp. 3939–3945. AAAI Press.

34.

, et al. A fault diagnosis data augmentation method integrating multimodal non-Gaussian denoising diffusion generative adversarial network. Adv Eng Inform 2025; 68(C): 103776.

35.

Tan

. EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th international conference on machine learning (ICML 2019), 2019.

36.

Albert

Tri

Mamba: linear-time sequence modeling with selective state spaces. In: Proceedings of the International Conference on Language Modeling (COLM), Philadelphia, Pennsylvania, United States, 7–9 October 2024, pp. 1039–1049.

37.

Wang

Kong

Feng

, et al. Is Mamba effective for time series forecasting? Neurocomputing 2025; 619: 129178.

38.

Wang

Tong

RLMamba: integrating residual learning with Mamba for long-term time series forecasting. Expert Syst Appl 2025; 278: 127362.

39.

Yang

Wang

Chen

, et al. MambaMIL: enhancing long sequence modeling with sequence reordering in computational pathology. In: Medical image computing and computer assisted intervention-MICCAI 2024. MICCAI 2024. Lecture notes in computer science (ed Linguraru

), Springer, 2024, 15004: 296–306.

40.

Ghasemi

Zare

Trojovský

, et al. Optimization based on the smart behavior of plants with its engineering applications: Ivy algorithm. Knowl Syst 2024; 295: 111850.

41.

Guo

Zhang

Co-optimization of the hardware configuration and energy management parameters of ship hybrid power systems based on the hybrid Ivy-SA algorithm. J Mar Sci Eng 2025; 13: 731.

42.

Shao

McAleer

Yan

, et al. Highly accurate machine fault diagnosis using deep transfer learning. IEEE Trans Ind Inform 2019; 15(4): 2446–2455.

43.

Case Western Reserve University, Bearing Data Center, [Online], 2013. https://engineering.case.edu/bearingdatacenter (accessed 17 February 2025).

44.

Howard

Sandler

Chen

, et al. Searching for MobileNetV3. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV). Seoul, Korea. IEEE, 2019, pp. 1314–1324.

45.

Han

Wang

Tian

, et al. GhostNet: more features from cheap operations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Seattle, WA, USA. IEEE, 2020, pp. 1580–1589.

46.

Ordóñez

Roggen

Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 2016; 16: 115.

47.

Donahue

Hendricks

Guadarrama

, et al. Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Boston, MA, USA. IEEE, 2015, pp. 2625–2634.

48.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Neural Inform Process Syst 2017; 30: 5998–6008.

49.

Jin

Xuan

, et al. 2019. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In: Proceedings of the 33rd international conference on neural information processing systems, 2019, Article 471, pp. 5243–5253. Red Hook, NY: Curran Associates Inc.

Multi-modal bearing fault diagnosis method fusing vision and complementary timing features

Abstract

Keywords

Get full access to this article

References