Sage Journals: Discover world-class research

Abstract

The significance of lip recognition lies in the application of visual information to compensate for auditory information, which is widely used in speech impairment recognition, security monitoring, computer-aided, virtual reality, and other fields. The biggest challenge in lip language recognition lies in the diversity of recognition objects and environmental interference. The key to improving this issue is to enhance the robustness of the model to external objective influences. This study proposes a lip language recognition algorithm that combines a reverse double-layer long short-term memory with a position sensitive attention mechanism. The ablation experiment proved the effectiveness of the research improvement, with an accuracy improvement of 15.19%. Three factors, including resolution, video length, and shooting angle, were set to verify the robustness of the proposed algorithm to external factors. These experiments confirm that the proposed algorithm performs well under different resolution conditions and video lengths. Compared to long short-term memory and recurrent neural networks, its accuracy has improved by 22.5% and 26.5%. Faced with different shooting angles, the proposed algorithm is almost unaffected by them, with minimal fluctuations in accuracy. Compared to long short-term memory and recurrent neural networks, it improves by 61% and 67%. In summary, the proposed algorithm for human–computer interaction lip recognition has better accuracy performance and higher robustness in the face of external interference.

Keywords

lip language recognition deep learning attention mechanism long short-term memory feature extraction

Get full access to this article

View all access options for this article.

References

Wang

. Dynamic safety prewarning mechanism of human–machine–environment using computer vision. Eng Constr Archit Ma 2020; 27(8): 1813–1833.

Kiki

VDH

Mehrkanoon

. Goal-driven, neurobiological-inspired convolutional neural network models of human spatial hearing. Neurocomputing 2022; 470: 432–442.

Keni

Radhakrishnan

. Using McGurk effect to detect speech-perceptional abnormalities in refractory epilepsy. Epilepsy Behav 2021; 114: 107600.

Srilakshmi

Karthik

. A novel method for lip movement detection using deep neural network. J Sci Ind Res 2022; 81(6): 643–650.

Andersen

Winther

. Regularized models of audiovisual integration of speech with predictive power for sparse behavioral data. J Math Psychol 2020; 98: 102404.

Ding

Wang

, et al. An optimal 3D convolutional neural network based lipreading method. IET Image Process 2022; 16(1): 113–122.

Petridis

Wang

, et al. End-to-end visual speech recognition for small-scale datasets. Patter Recogn Lett 2020; 131: 421–427.

Zhang

, et al. Hearme: Accurate and real-time lip reading based on commercial rfid devices. IEEE T Mobile Comput 2023; 22(12): 7266–7278.

Zhu

Wang

. Lipreading model based on a two-way convolutional neural network and feature fusion. J Electron Imag 2021; 30(6): 063003.

10.

Dong

Song

SYQ

, et al. Electromyogram-based lip-reading via unobtrusive dry electrodes and machine learning methods. Small 2023; 19(17): 2205058.

11.

Ullah

Naeem

, et al. CroLSSim: Cross-language software similarity detector using hybrid approach of LSA-based AST-MDrep features and CNN-LSTM model. Int J Intell Syst 2022; 37(9): 5768–5795.

12.

Peng

Zhang

Shi

, et al. SRAI-LSTM: a social relation attention-based interaction-aware LSTM for human trajectory prediction. Neurocomputing 2022; 490: 258–268.

13.

Teng

Zhang

Luo

. Multi-scale local cues and hierarchical attention-based LSTM for stock price trend prediction. Neurocomputing 2022; 505: 92–100.

14.

Chen

Ding

. Detecting deepfake videos based on spatiotemporal attention and convolutional LSTM. Inf Sci 2022; 601: 58–70.

15.

Huang

Sutherland

. Layer-wise relevance propagation for interpreting LSTM-RNN decisions in predictive maintenance. Int J Adv Manuf Tech 2022; 118(3-4): 963–978.

16.

Vikalo

. Real-time radio technology and modulation classification via an LSTM auto-encoder. IEEE T Wirel Commun 2022; 21(1): 370–382.

17.

Peng

Liu

, et al. Temporal link prediction in directed networks based on self-attention mechanism. Intell Data Anal 2022; 26(1): 173–188.

18.

Wei

Wang

, et al. MSGSE-Net: Multi-scale guided squeeze-and-excitation network for subcortical brain structure segmentation. Neurocomputing 2021; 461: 228–243.

19.

Guo

Wang

, et al. DeepPSP: A global–local information-based deep neural network for the prediction of protein phosphorylation sites. J Proteome Res 2021; 20(1): 346–356.

20.

Sheng

Zhu

, et al. Adaptive semantic-spatio-temporal graph convolutional network for lip reading. IEEE T Multimedia 2021; 24: 3545–3557.

21.

Petridis

Pantic

. Visual speech recognition for multiple languages in the wild. Nat Mach Intell 2022; 4(11): 930–939.

LSTM algorithm for human machine interaction lip recognition

Abstract

Keywords

Get full access to this article

References