Abstract
Traditional elderly care systems often rely on single-modality emotion recognition, overlooking the complexity and diversity of emotional expressions in the elderly. This study integrates three pre-trained models—VGGFace for facial recognition, DeepSpeech for voice analysis, and BERT for text processing—to develop a multimodal emotion recognition system that enhances intelligent and personalized elderly care. A high-quality emotion dataset is first constructed through comprehensive data collection and preprocessing. Transfer learning is then applied to optimize the three models, achieving precise facial, voice, and text-based emotion recognition. By incorporating feature-level and decision-level fusion strategies, the system improves recognition accuracy and robustness. A dynamic feedback mechanism further refines care strategies, ensuring a more adaptive and humanized elderly care experience. Comparative experiments show that the proposed system significantly outperforms existing models in emotion recognition accuracy (88.63%) and recall (85.93%), with an average response time of just 3.22 seconds. Moreover, after system implementation, the mental health scores of elderly participants improved, decreasing from 6.53 to 4.73 points. User feedback confirms the system’s effectiveness, with an average satisfaction rating of 4.33 across multiple care service dimensions. These findings highlight the potential of multimodal AI-driven emotion recognition in enhancing emotional well-being and mental health support for the elderly, offering a valuable reference for the future development of intelligent elderly care systems.
Get full access to this article
View all access options for this article.
