Advancing Dysarthric Speech-to-Text Recognition with LATTE: A Low-Latency Acoustic Modeling Approach for Real-Time Communication

Abstract

Dysarthria, a motor speech disorder characterized by slurred and often unintelligible speech, presents substantial challenges for effective communication. Conventional automatic speech recognition systems frequently underperform on dysarthric speech, particularly in severe cases. To address this gap, we introduce low-latency acoustic transcription and textual encoding (LATTE), an advanced framework designed for real-time dysarthric speech recognition. LATTE integrates preprocessing, acoustic processing, and transcription mapping into a unified pipeline, with its core powered by a hybrid architecture that combines convolutional layers for acoustic feature extraction with bidirectional temporal layers for modeling temporal dependencies. Evaluated on the UA-Speech dataset, LATTE achieves a word error rate of 12.5%, phoneme error rate of 8.3%, and a character error rate of 1%. By enabling accurate, low-latency transcription of impaired speech, LATTE provides a robust foundation for enhancing communication and accessibility in both digital applications and real-time interactive environments.

Keywords

acoustic features dysarthric speech speech impairment visual encoding

Get full access to this article

View all access options for this article.

References

Freed

. Motor speech disorders: diagnosis and treatment. Plural Publishing, 2023.

Enderby

. Disorders of communication: Dysarthria. Handb Clin Neurol, 2013; 110:273–281.

Castillo Guerra

, Lovey

. A modern approach to dysarthria classification. Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No. 03CH37439). Vol. 3. IEEE, 2003.

Albaqshi

, Sagheer

. Dysarthric speech recognition using convolutional recurrent neural networks. International Journal of Intelligent Engineering Systems, 2020; 13(6):384–392.

Barrett

, Hu

, Howell

. Systematic review of machine learning approaches for detecting developmental stuttering. IEEE/ACM Trans Audio Speech Lang Process, 2022; 30:1160–1172.

Joshy

, Rajan

. Automated dysarthria severity classification: A study on acoustic features and deep learning techniques. IEEE Trans Neural Syst Rehabil Eng, 2022; 30:1147–1157.

Qian

, Xiao

. A Survey of automatic speech recognition for dysarthric speech. Electronics (Basel), 2023; 12(20):4278.

Saksamudre

, Shrishrimal

, Deshmukh

. A review on different approaches for speech recognition system. Int J Comput Appl, 2015; 115(22):23–28.

Okolo

, et al. Data-driven approaches to bridging the gap in health communication disparities: A systematic review. World Journal of Advanced Research and Reviews, 2024; 21(2):1435–1445.

10.

Singhania

, Reddy

. Improving preventative care and health outcomes for patients with chronic diseases using big data-driven insights and predictive modeling. International Journal of Applied Health Care Analytics, 2024; 9(2):1–14.

11.

Veetil

, V

, Orozco-Arroyave

, et al. Robust language independent voice data driven Parkinson’s disease detection. Eng Appl Artif Intell, 2024; 129:107494.

12.

Salameh

, Surakhi

OLAM

, Khanafseh

. A comprehensive survey on the data-driven approaches used for tackling the covid-19 pandemic. WSEAS Transactions on Biology and Biomedicine, 2024; 21:200–217.

13.

Zhang

, Wu

, Qiu

, et al. Intelligent speech technologies for transcription, disease diagnosis, and medical equipment interactive control in smart hospitals: A review. Comput Biol Med, 2023; 153:106517.

14.

Yang

, Li

, Ding

, et al. Deep learning-based speech analysis for Alzheimer’s disease detection: A literature review. Alzheimers Res Ther, 2022; 14(1):186.

15.

Lee

, Seong

, Ozlu

, et al. Biosignal sensors and deep learning-based speech recognition: A review. Sensors (Basel), 2021; 21(4):1399.

16.

Khan

, Kader

, Islam

SMR

, et al. Machine learning and deep learning approaches for brain disease diagnosis: Principles and recent advances. Ieee Access, 2021; 9:37622–37655.

17.

Fernández-Díaz

, Gallardo-Antolín

. An attention Long Short-Term Memory based system for automatic classification of speech intelligibility. Eng Appl Artif Intell, 2020; 96:103976.

18.

Sidi Yakoub

, Selouani

S-A

, Zaidi

B-F

, et al. Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. J Audio Speech Music Proc, 2020; 2020(1):1–7.

19.

Sheng

, Kuang

, Bai

, et al. Deep learning for visual speech analysis: A survey. IEEE Trans Pattern Anal Mach Intell, 2024; 46(9):6001–6022.

20.

Chen

, Zhang

, Yang

, et al. Improving cross-lingual low-resource speech recognition by Task-based Meta PolyLoss. Computer Speech & Language, 2024; 87 Article:101648.

21.

Liu

, Yang

, Qu

. Exploration of Whisper fine-tuning strategies for low-resource ASR. J Audio Speech Music PROC, 2024 Article number;2024(1); doi: 10.1186/s13636-024-00349-3

22.

Geng

, et al. Investigation of data augmentation techniques for disordered speech recognition. ArXiv, 2022 preprint arXiv:2201.05562.

23.

Celin

TAM

, Nagarajan

, Vijayalakshmi

. Data augmentation using virtual microphone array synthesis and multi-resolution feature extraction for isolated word dysarthric speech recognition. in IEEE J Sel Top Signal Process, 2020; 14(2):346–354; doi: 10.1109/JSTSP.2020.2972161

24.

Chandrakala

, Malini

, Veni

. Histogram of states based assistive system for speech impairment due to neurological disorders. IEEE Trans Neural Syst Rehabil Eng, 2021; 29:2425–2434; doi: 10.1109/TNSRE.2021.3125314

25.

, Su

, Qian

. Multi-stage audio-visual fusion for dysarthric speech recognition with pre-trained models. in IEEE Trans Neural Syst Rehabil Eng, 2023; 31:1912–1921; doi: 10.1109/TNSRE.2023.3262001

26.

Xiong

, Barker

, Christensen

. Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition, Speech Communication; 13th ITG-Symposium, Oldenburg, Germany, 2018, pp. 1–5.

27.

Mahum

, Ganiyu

, Hidri

, et al. A novel Swin transformer based framework for speech recognition for dysarthria. Sci Rep, 2025; 15(1):20070.

28.

, Phooi Seng

, Ang

. Collaborative AI dysarthric speech recognition system with data augmentation using generative adversarial neural network. IEEE Trans Neural Syst Rehabil Eng, 2025; 33:2097–2111.

29.

Yue

, et al. Multi-modal dysarthric speech recognition based on audio and lip features. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2025; 33:1123–1132.

30.

Peng

, et al. Decoding dysarthric speech using improved conformer with phoneme representation. IEEE/ACM Trans Audio Speech Lang Process, 2025; 33:1784–1796.

31.

Shahamiri

, Mandal

, Sarkar

. Dysarthric speech recognition: An investigation on using depthwise separable convolutions and residual connections. Neural Comput & Applic, 2025; 37(12):7991–8005.

32.

, Xie

, Geng

, et al. Self-supervised ASR models and features for dysarthric and elderly speech recognition. IEEE/ACM Trans Audio Speech Lang Process, 2024; 32:3561–3575.

33.

Mehmood

, et al. An automatic multilingual dysarthric speech detection system based on prosodic and acoustic cues. IEEE Access, 2025; 13:55622–55633.

34.

, et al. Exploring self-supervised pre-trained ASR models for dysarthric and elderly speech recognition. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023.

35.

Menéndez-Pidal

, Polikoff

, Peters

, et al. The Nemours database of dysarthric speech, in Proc. Intl. Conf. Spoken Language Processing (ICSLP), Philadelphia, PA, USA, 1996, pp. 1962–1965; doi: 10.1109/ICSLP.1996.608020

36.

Rudzicz

, Namasivayam

, Wolff

. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resources & Evaluation, 2012; 46(4):523–541.

37.

Kim

, et al. Dysarthric speech database for universal access research, in Proc. 9th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), Brisbane, QLD, Australia, 2008, pp. 1741–1744.

38.

MacDonald

, et al. Disordered speech data collection: Lessons learned at 1 million utterances from project Euphonia, in Proc. Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2021, pp. 4833–4837, doi: 10.21437/Interspeech.2021-697

39.

Liu

, Geng

, Hu

, et al. Recent progress in the CUHK dysarthric speech recognition system. IEEE/ACM Trans Audio Speech Lang Process, 2021; 29:2267–2281; doi: 10.1109/TASLP.2021.3091805