Abstract
The effectiveness of online music education relies heavily on understanding and addressing students’ emotional states, which can impact engagement and learning outcomes. This paper presents a novel emotion recognition method based on an improved frame attention network (IFAN), designed specifically for online music teaching. The method utilizes facial expression data to identify four key emotional states—pleasure, concentration, confusion, and boredom—by introducing deformable convolution to better capture dynamic facial features and a feature aggregation module to enhance emotional temporal patterns. The proposed model achieves recognition accuracies of 96%, 94%, 93%, and 98% for each emotional state, outperforming existing emotion recognition methods. Experimental results indicate that the model is robust and highly accurate in the context of online music education. This research provides a foundation for real-time emotion recognition systems in online teaching environments, with potential for future work to incorporate multimodal data, such as audio and physiological signals, to further enhance model performance.
Get full access to this article
View all access options for this article.
