Abstract
The significance of lip recognition lies in the application of visual information to compensate for auditory information, which is widely used in speech impairment recognition, security monitoring, computer-aided, virtual reality, and other fields. The biggest challenge in lip language recognition lies in the diversity of recognition objects and environmental interference. The key to improving this issue is to enhance the robustness of the model to external objective influences. This study proposes a lip language recognition algorithm that combines a reverse double-layer long short-term memory with a position sensitive attention mechanism. The ablation experiment proved the effectiveness of the research improvement, with an accuracy improvement of 15.19%. Three factors, including resolution, video length, and shooting angle, were set to verify the robustness of the proposed algorithm to external factors. These experiments confirm that the proposed algorithm performs well under different resolution conditions and video lengths. Compared to long short-term memory and recurrent neural networks, its accuracy has improved by 22.5% and 26.5%. Faced with different shooting angles, the proposed algorithm is almost unaffected by them, with minimal fluctuations in accuracy. Compared to long short-term memory and recurrent neural networks, it improves by 61% and 67%. In summary, the proposed algorithm for human–computer interaction lip recognition has better accuracy performance and higher robustness in the face of external interference.
Keywords
Get full access to this article
View all access options for this article.
